Vision Framework API Reference
Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.
When to Use This Reference
-
Implementing subject lifting using VisionKit or Vision
-
Detecting hand/body poses for gesture recognition or fitness apps
-
Segmenting people from backgrounds or separating multiple individuals
-
Face detection and landmarks for AR effects or authentication
-
Combining Vision APIs to solve complex computer vision problems
-
Looking up specific API signatures and parameter meanings
-
Recognizing text in images (OCR) with VNRecognizeTextRequest
-
Detecting barcodes and QR codes with VNDetectBarcodesRequest
-
Building live scanners with DataScannerViewController
-
Scanning documents with VNDocumentCameraViewController
-
Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)
Related skills: See axiom-vision for decision trees and patterns, axiom-vision-diag for troubleshooting
Vision Framework Overview
Vision provides computer vision algorithms for still images and video:
Core workflow:
-
Create request (e.g., VNDetectHumanHandPoseRequest() )
-
Create handler with image (VNImageRequestHandler(cgImage: image) )
-
Perform request (try handler.perform([request]) )
-
Access observations from request.results
Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates
Performance: Run on background queue - resource intensive, blocks UI if on main thread
Subject Segmentation APIs
VNGenerateForegroundInstanceMaskRequest
Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+
Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)
Basic Usage
let request = VNGenerateForegroundInstanceMaskRequest() let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else { return }
InstanceMaskObservation
allInstances: IndexSet containing all foreground instance indices (excludes background 0)
instanceMask: CVPixelBuffer with UInt8 labels (0 = background, 1+ = instance indices)
instanceAtPoint(_:): Returns instance index at normalized point
let point = CGPoint(x: 0.5, y: 0.5) // Center of image let instance = observation.instanceAtPoint(point)
if instance == 0 { print("Background tapped") } else { print("Instance (instance) tapped") }
Generating Masks
createScaledMask(for:croppedToInstancesContent:)
Parameters:
-
for : IndexSet of instances to include
-
croppedToInstancesContent :
-
false = Output matches input resolution (for compositing)
-
true = Tight crop around selected instances
Returns: Single-channel floating-point CVPixelBuffer (soft segmentation mask)
// All instances, full resolution let mask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false )
// Single instance, cropped let instances = IndexSet(integer: 1) let croppedMask = try observation.createScaledMask( for: instances, croppedToInstancesContent: true )
Instance Mask Hit Testing
Access raw pixel buffer to map tap coordinates to instance labels:
let instanceMask = observation.instanceMask
CVPixelBufferLockBaseAddress(instanceMask, .readOnly) defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(instanceMask) let width = CVPixelBufferGetWidth(instanceMask) let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)
// Convert normalized tap to pixel coordinates let pixelPoint = VNImagePointForNormalizedPoint( CGPoint(x: normalizedX, y: normalizedY), width: imageWidth, height: imageHeight )
// Calculate byte offset let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)
// Read instance label let label = UnsafeRawPointer(baseAddress!).load( fromByteOffset: offset, as: UInt8.self )
let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))
VisionKit Subject Lifting
ImageAnalysisInteraction (iOS)
Availability: iOS 16+, iPadOS 16+
Adds system-like subject lifting UI to views:
let interaction = ImageAnalysisInteraction() interaction.preferredInteractionTypes = .imageSubject // Or .automatic imageView.addInteraction(interaction)
Interaction types:
-
.automatic : Subject lifting + Live Text + data detectors
-
.imageSubject : Subject lifting only (no interactive text)
ImageAnalysisOverlayView (macOS)
Availability: macOS 13+
let overlayView = ImageAnalysisOverlayView() overlayView.preferredInteractionTypes = .imageSubject nsView.addSubview(overlayView)
Programmatic Access
ImageAnalyzer
let analyzer = ImageAnalyzer() let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])
let analysis = try await analyzer.analyze(image, configuration: configuration)
ImageAnalysis
subjects: [Subject]
- All subjects in image
highlightedSubjects: Set<Subject>
- Currently highlighted (user long-pressed)
subject(at:): Async lookup of subject at normalized point (returns nil if none)
// Get all subjects let subjects = analysis.subjects
// Look up subject at tap if let subject = try await analysis.subject(at: tapPoint) { // Process subject }
// Change highlight state analysis.highlightedSubjects = Set([subjects[0], subjects[1]])
Subject Struct
image: UIImage /NSImage
- Extracted subject with transparency
bounds: CGRect
- Subject boundaries in image coordinates
// Single subject image let subjectImage = subject.image
// Composite multiple subjects let compositeImage = try await analysis.image(for: [subject1, subject2])
Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)
Person Segmentation APIs
VNGeneratePersonSegmentationRequest
Availability: iOS 15+, macOS 12+
Returns single mask containing all people in image:
let request = VNGeneratePersonSegmentationRequest() // Configure quality level if needed try handler.perform([request])
guard let observation = request.results?.first as? VNPixelBufferObservation else { return }
let personMask = observation.pixelBuffer // CVPixelBuffer
VNGeneratePersonInstanceMaskRequest
Availability: iOS 17+, macOS 14+
Returns separate masks for up to 4 people:
let request = VNGeneratePersonInstanceMaskRequest() try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else { return }
// Same InstanceMaskObservation API as foreground instance masks let allPeople = observation.allInstances // Up to 4 people (1-4)
// Get mask for person 1 let person1Mask = try observation.createScaledMask( for: IndexSet(integer: 1), croppedToInstancesContent: false )
Limitations:
-
Segments up to 4 people
-
With >4 people: may miss people or combine them (typically background people)
-
Use VNDetectFaceRectanglesRequest to count faces if you need to handle crowded scenes
Hand Pose Detection
VNDetectHumanHandPoseRequest
Availability: iOS 14+, macOS 11+
Detects 21 hand landmarks per hand:
let request = VNDetectHumanHandPoseRequest() request.maximumHandCount = 2 // Default: 2, increase if needed
let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])
for observation in request.results as? [VNHumanHandPoseObservation] ?? [] { // Process each hand }
Performance note: maximumHandCount affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.
Hand Landmarks (21 points)
Wrist: 1 landmark
Thumb (4 landmarks):
-
.thumbTip
-
.thumbIP (interphalangeal joint)
-
.thumbMP (metacarpophalangeal joint)
-
.thumbCMC (carpometacarpal joint)
Fingers (4 landmarks each):
-
Tip (.indexTip , .middleTip , .ringTip , .littleTip )
-
DIP (distal interphalangeal joint)
-
PIP (proximal interphalangeal joint)
-
MCP (metacarpophalangeal joint)
Group Keys
Access landmark groups:
Group Key Points
.all
All 21 landmarks
.thumb
4 thumb joints
.indexFinger
4 index finger joints
.middleFinger
4 middle finger joints
.ringFinger
4 ring finger joints
.littleFinger
4 little finger joints
// Get all points let allPoints = try observation.recognizedPoints(.all)
// Get index finger points only let indexPoints = try observation.recognizedPoints(.indexFinger)
// Get specific point let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip)
// Check confidence guard thumbTip.confidence > 0.5 else { return }
// Access location (normalized coordinates, lower-left origin) let location = thumbTip.location // CGPoint
Gesture Recognition Example (Pinch)
let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip)
guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else { return }
let distance = hypot( thumbTip.location.x - indexTip.location.x, thumbTip.location.y - indexTip.location.y )
let isPinching = distance < 0.05 // Normalized threshold
Chirality (Handedness)
let chirality = observation.chirality // .left or .right or .unknown
Body Pose Detection
VNDetectHumanBodyPoseRequest (2D)
Availability: iOS 14+, macOS 11+
Detects 18 body landmarks (2D normalized coordinates):
let request = VNDetectHumanBodyPoseRequest() try handler.perform([request])
for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] { // Process each person }
Body Landmarks (18 points)
Face (5 landmarks):
- .nose , .leftEye , .rightEye , .leftEar , .rightEar
Arms (6 landmarks):
-
Left: .leftShoulder , .leftElbow , .leftWrist
-
Right: .rightShoulder , .rightElbow , .rightWrist
Torso (7 landmarks):
-
.neck (between shoulders)
-
.leftShoulder , .rightShoulder (also in arm groups)
-
.leftHip , .rightHip
-
.root (between hips)
Legs (6 landmarks):
-
Left: .leftHip , .leftKnee , .leftAnkle
-
Right: .rightHip , .rightKnee , .rightAnkle
Note: Shoulders and hips appear in multiple groups
Group Keys (Body)
Group Key Points
.all
All 18 landmarks
.face
5 face landmarks
.leftArm
shoulder, elbow, wrist
.rightArm
shoulder, elbow, wrist
.torso
neck, shoulders, hips, root
.leftLeg
hip, knee, ankle
.rightLeg
hip, knee, ankle
// Get all body points let allPoints = try observation.recognizedPoints(.all)
// Get left arm only let leftArmPoints = try observation.recognizedPoints(.leftArm)
// Get specific joint let leftWrist = try observation.recognizedPoint(.leftWrist)
VNDetectHumanBodyPose3DRequest (3D)
Availability: iOS 17+, macOS 14+
Returns 3D skeleton with 17 joints in meters (real-world coordinates):
let request = VNDetectHumanBodyPose3DRequest() try handler.perform([request])
guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else { return }
// Get 3D joint position let leftWrist = try observation.recognizedPoint(.leftWrist) let position = leftWrist.position // simd_float4x4 matrix let localPosition = leftWrist.localPosition // Relative to parent joint
3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)
3D Observation Properties
bodyHeight: Estimated height in meters
-
With depth data: Measured height
-
Without depth data: Reference height (1.8m)
heightEstimation: .measured or .reference
cameraOriginMatrix: simd_float4x4 camera position/orientation relative to subject
pointInImage(_:): Project 3D joint back to 2D image coordinates
let wrist2D = try observation.pointInImage(leftWrist)
3D Point Classes
VNPoint3D: Base class with simd_float4x4 position matrix
VNRecognizedPoint3D: Adds identifier (joint name)
VNHumanBodyRecognizedPoint3D: Adds localPosition and parentJoint
// Position relative to skeleton root (center of hip) let modelPosition = leftWrist.position
// Position relative to parent joint (left elbow) let relativePosition = leftWrist.localPosition
Depth Input
Vision accepts depth data alongside images:
// From AVDepthData let handler = VNImageRequestHandler( cvPixelBuffer: imageBuffer, depthData: depthData, orientation: orientation )
// From file (automatic depth extraction) let handler = VNImageRequestHandler(url: imageURL) // Depth auto-fetched
Depth formats: Disparity or Depth (interchangeable via AVFoundation)
LiDAR: Use in live capture sessions for accurate scale/measurement
Face Detection & Landmarks
VNDetectFaceRectanglesRequest
Availability: iOS 11+
Detects face bounding boxes:
let request = VNDetectFaceRectanglesRequest() try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] { let faceBounds = observation.boundingBox // Normalized rect }
VNDetectFaceLandmarksRequest
Availability: iOS 11+
Detects face with detailed landmarks:
let request = VNDetectFaceLandmarksRequest() try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] { if let landmarks = observation.landmarks { let leftEye = landmarks.leftEye let nose = landmarks.nose let leftPupil = landmarks.leftPupil // Revision 2+ } }
Revisions:
-
Revision 1: Basic landmarks
-
Revision 2: Detects upside-down faces
-
Revision 3+: Pupil locations
Person Detection
VNDetectHumanRectanglesRequest
Availability: iOS 13+
Detects human bounding boxes (torso detection):
let request = VNDetectHumanRectanglesRequest() try handler.perform([request])
for observation in request.results as? [VNHumanObservation] ?? [] { let humanBounds = observation.boundingBox // Normalized rect }
Use case: Faster than pose detection when you only need location
CoreImage Integration
CIBlendWithMask Filter
Composite subject on new background using Vision mask:
// 1. Get mask from Vision let observation = request.results?.first as? VNInstanceMaskObservation let visionMask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false )
// 2. Convert to CIImage let maskImage = CIImage(cvPixelBuffer: axiom-visionMask)
// 3. Apply filter let filter = CIFilter(name: "CIBlendWithMask")! filter.setValue(sourceImage, forKey: kCIInputImageKey) filter.setValue(maskImage, forKey: kCIInputMaskImageKey) filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)
let output = filter.outputImage // Composited result
Parameters:
-
Input image: Original image to mask
-
Mask image: Vision's soft segmentation mask
-
Background image: New background (or empty image for transparency)
HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)
Text Recognition APIs
VNRecognizeTextRequest
Availability: iOS 13+, macOS 10.15+
Recognizes text in images with configurable accuracy/speed trade-off.
Basic Usage
let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate // Or .fast request.recognitionLanguages = ["en-US", "de-DE"] // Order matters request.usesLanguageCorrection = true
let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])
for observation in request.results as? [VNRecognizedTextObservation] ?? [] { // Get top candidates let candidates = observation.topCandidates(3) let bestText = candidates.first?.string ?? "" }
Recognition Levels
Level Performance Accuracy Best For
.fast
Real-time Good Camera feed, large text, signs
.accurate
Slower Excellent Documents, receipts, handwriting
Fast path: Character-by-character recognition (Neural Network → Character Detection)
Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)
Properties
Property Type Description
recognitionLevel
VNRequestTextRecognitionLevel
.fast or .accurate
recognitionLanguages
[String]
BCP 47 language codes, order = priority
usesLanguageCorrection
Bool
Use language model for correction
customWords
[String]
Domain-specific vocabulary
automaticallyDetectsLanguage
Bool
Auto-detect language (iOS 16+)
minimumTextHeight
Float
Min text height as fraction of image (0-1)
revision
Int
API version (affects supported languages)
Language Support
// Check supported languages for current settings let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages( for: .accurate, revision: VNRecognizeTextRequestRevision3 )
Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.
Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).
VNRecognizedTextObservation
boundingBox: Normalized rect containing recognized text
topCandidates(_:): Returns [VNRecognizedText] ordered by confidence
VNRecognizedText
Property Type Description
string
String
Recognized text
confidence
VNConfidence
0.0-1.0
boundingBox(for:)
VNRectangleObservation?
Box for substring range
// Get bounding box for substring let text = candidate.string if let range = text.range(of: "invoice") { let box = try candidate.boundingBox(for: range) }
Barcode Detection APIs
VNDetectBarcodesRequest
Availability: iOS 11+, macOS 10.13+
Detects and decodes barcodes and QR codes.
Basic Usage
let request = VNDetectBarcodesRequest() request.symbologies = [.qr, .ean13, .code128] // Specific codes
let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])
for barcode in request.results as? [VNBarcodeObservation] ?? [] { let payload = barcode.payloadStringValue let type = barcode.symbology let bounds = barcode.boundingBox }
Symbologies
1D Barcodes:
-
.codabar (iOS 15+)
-
.code39 , .code39Checksum , .code39FullASCII , .code39FullASCIIChecksum
-
.code93 , .code93i
-
.code128
-
.ean8 , .ean13
-
.gs1DataBar , .gs1DataBarExpanded , .gs1DataBarLimited (iOS 15+)
-
.i2of5 , .i2of5Checksum
-
.itf14
-
.upce
2D Codes:
-
.aztec
-
.dataMatrix
-
.microPDF417 (iOS 15+)
-
.microQR (iOS 15+)
-
.pdf417
-
.qr
Performance: Specifying fewer symbologies = faster detection
Revisions
Revision iOS Features
1 11+ Basic detection, one code at a time
2 15+ Codabar, GS1, MicroPDF, MicroQR, better ROI
3 16+ ML-based, multiple codes, better bounding boxes
VNBarcodeObservation
Property Type Description
payloadStringValue
String?
Decoded content
symbology
VNBarcodeSymbology
Barcode type
boundingBox
CGRect
Normalized bounds
topLeft/topRight/bottomLeft/bottomRight
CGPoint
Corner points
VisionKit Scanner APIs
DataScannerViewController
Availability: iOS 16+
Camera-based live scanner with built-in UI for text and barcodes.
Check Availability
// Hardware support DataScannerViewController.isSupported
// Runtime availability (camera access, parental controls) DataScannerViewController.isAvailable
Configuration
import VisionKit
let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [ .barcode(symbologies: [.qr, .ean13]), .text(textContentType: .URL), // Or nil for all text // .text(languages: ["ja"]) // Filter by language ]
let scanner = DataScannerViewController( recognizedDataTypes: dataTypes, qualityLevel: .balanced, // .fast, .balanced, .accurate recognizesMultipleItems: true, isHighFrameRateTrackingEnabled: true, isPinchToZoomEnabled: true, isGuidanceEnabled: true, isHighlightingEnabled: true )
scanner.delegate = self present(scanner, animated: true) { try? scanner.startScanning() }
RecognizedDataType
Type Description
.barcode(symbologies:)
Specific barcode types
.text()
All text
.text(languages:)
Text filtered by language
.text(textContentType:)
Text filtered by type (URL, phone, email)
Delegate Protocol
protocol DataScannerViewControllerDelegate { func dataScanner(_ dataScanner: DataScannerViewController, didTapOn item: RecognizedItem)
func dataScanner(_ dataScanner: DataScannerViewController,
didAdd addedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didUpdate updatedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didRemove removedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}
RecognizedItem
enum RecognizedItem { case text(RecognizedItem.Text) case barcode(RecognizedItem.Barcode)
var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }
}
// Text item struct Text { let transcript: String }
// Barcode item struct Barcode { let payloadStringValue: String? let observation: VNBarcodeObservation }
Async Stream
// Alternative to delegate for await items in scanner.recognizedItems { // Current recognized items }
Custom Highlights
// Add custom views over recognized items scanner.overlayContainerView.addSubview(customHighlight)
// Capture still photo let photo = try await scanner.capturePhoto()
VNDocumentCameraViewController
Availability: iOS 13+
Document scanning with automatic edge detection, perspective correction, and lighting adjustment.
Basic Usage
import VisionKit
let camera = VNDocumentCameraViewController() camera.delegate = self present(camera, animated: true)
Delegate Protocol
protocol VNDocumentCameraViewControllerDelegate { func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan)
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
}
VNDocumentCameraScan
Property Type Description
pageCount
Int
Number of scanned pages
imageOfPage(at:)
UIImage
Get page image at index
title
String
User-editable title
func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) { controller.dismiss(animated: true)
for i in 0..<scan.pageCount {
let pageImage = scan.imageOfPage(at: i)
// Process with VNRecognizeTextRequest
}
}
Document Analysis APIs
VNDetectDocumentSegmentationRequest
Availability: iOS 15+, macOS 12+
Detects document boundaries for custom camera UIs or post-processing.
let request = VNDetectDocumentSegmentationRequest() let handler = VNImageRequestHandler(ciImage: image) try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else { return // No document found }
// Get corner points (normalized) let corners = [ observation.topLeft, observation.topRight, observation.bottomLeft, observation.bottomRight ]
vs VNDetectRectanglesRequest:
-
Document: ML-based, trained specifically on documents
-
Rectangle: Edge-based, finds any quadrilateral
RecognizeDocumentsRequest (iOS 26+)
Availability: iOS 26+, macOS 26+
Structured document understanding with semantic parsing.
Basic Usage
let request = RecognizeDocumentsRequest() let observations = try await request.perform(on: imageData)
guard let document = observations.first?.document else { return }
DocumentObservation Hierarchy
DocumentObservation └── document: DocumentObservation.Document ├── text: TextObservation ├── tables: [Container.Table] ├── lists: [Container.List] └── barcodes: [Container.Barcode]
Table Extraction
for table in document.tables { for row in table.rows { for cell in row { let text = cell.content.text.transcript let detectedData = cell.content.text.detectedData } } }
Detected Data Types
for data in document.text.detectedData { switch data.match.details { case .emailAddress(let email): let address = email.emailAddress case .phoneNumber(let phone): let number = phone.phoneNumber case .link(let url): let link = url case .address(let address): let components = address case .date(let date): let dateValue = date default: break } }
TextObservation Hierarchy
TextObservation ├── transcript: String ├── lines: [TextObservation.Line] ├── paragraphs: [TextObservation.Paragraph] ├── words: [TextObservation.Word] └── detectedData: [DetectedDataObservation]
API Quick Reference
Subject Segmentation
API Platform Purpose
VNGenerateForegroundInstanceMaskRequest
iOS 17+ Class-agnostic subject instances
VNGeneratePersonInstanceMaskRequest
iOS 17+ Up to 4 people separately
VNGeneratePersonSegmentationRequest
iOS 15+ All people (single mask)
ImageAnalysisInteraction (VisionKit) iOS 16+ UI for subject lifting
Pose Detection
API Platform Landmarks Coordinates
VNDetectHumanHandPoseRequest
iOS 14+ 21 per hand 2D normalized
VNDetectHumanBodyPoseRequest
iOS 14+ 18 body joints 2D normalized
VNDetectHumanBodyPose3DRequest
iOS 17+ 17 body joints 3D meters
Face & Person Detection
API Platform Purpose
VNDetectFaceRectanglesRequest
iOS 11+ Face bounding boxes
VNDetectFaceLandmarksRequest
iOS 11+ Face with detailed landmarks
VNDetectHumanRectanglesRequest
iOS 13+ Human torso bounding boxes
Text & Barcode
API Platform Purpose
VNRecognizeTextRequest
iOS 13+ Text recognition (OCR)
VNDetectBarcodesRequest
iOS 11+ Barcode/QR detection
DataScannerViewController
iOS 16+ Live camera scanner (text + barcodes)
VNDocumentCameraViewController
iOS 13+ Document scanning with perspective correction
VNDetectDocumentSegmentationRequest
iOS 15+ Programmatic document edge detection
RecognizeDocumentsRequest
iOS 26+ Structured document extraction
Observation Types
Observation Returned By
VNInstanceMaskObservation
Foreground/person instance masks
VNPixelBufferObservation
Person segmentation (single mask)
VNHumanHandPoseObservation
Hand pose
VNHumanBodyPoseObservation
Body pose (2D)
VNHumanBodyPose3DObservation
Body pose (3D)
VNFaceObservation
Face detection/landmarks
VNHumanObservation
Human rectangles
VNRecognizedTextObservation
Text recognition
VNBarcodeObservation
Barcode detection
VNRectangleObservation
Document segmentation
DocumentObservation
Structured document (iOS 26+)
Resources
WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099
Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest
Skills: axiom-vision, axiom-vision-diag