Automated Dataset Generation, Object Detection, and Mobile Deployment
Overview
As a Machine Learning Engineer at Zortag, I automated dataset generation, optimized object detection models, and shipped an iPhone app for real-time inference. Zortag’s 3D QR stack needs high-accuracy authentication; the same product flow also needed the phone to know when a laptop or monitor screen was in view. That second problem sounds simple, but in practice screens are not “nice” detection targets: large regions, uneven texture, glare, and moiré meant a YOLO-style bounding-box model we tried first kept missing or jittering on real hardware.
I addressed that with a CLIP ViT-L/14@336 image encoder feeding a small MLP for a stable yes/no screen decision, with frames streamed over WebSockets, multi-crop inference at test time, EMA smoothing, and a short hysteresis window so the UI did not flicker—end-to-end roughly sub-second behavior on live video. The robotics and QR pieces are what most people see first; the screen pathway was the parallel engineering thread that made the whole authentication story usable outside a lab.
Key Components:
- Robotic arm automation for scalable data collection
- YOLOv8 fine-tuning (99.84% accuracy) for QR / label tasks
- Synthetic data generation
- iPhone app: CoreML + SwiftUI, QR inference plus the CLIP + MLP screen head above (video demo at the beginning)
Note: Code and datasets are proprietary.
Key Contributions
1. Robotic Automation & Dataset Generation
- Programmed myCobot 280 PI to capture multi-angle images of 3D QR codes via human-like motions.
- Motorized label rewinders simulated real-world movement, reducing manual effort by 90%.
2. Model Optimization
- Fine-tuned YOLOv8 for anomaly detection using hybrid real/synthetic data, achieving 99.84% detection accuracy for fake 3D QR codes.
- Automated labeling pipelines eliminated manual annotation errors.
3. iPhone App for Inference
- Developed a native iOS app to run optimized YOLOv8 (CoreML) for 3D QR work on-device.
- Added live screen / display detection for field use: CLIP + MLP with streaming, multi-crop evaluation, and temporal smoothing so the app could trust the signal in uneven lighting.
- Offline-capable where the models allow it; SwiftUI UI aimed at operators in the field.
Methodology
- Robotics: Python/ROS-controlled arm trajectories with OpenCV-based image capture.
- ML Pipeline:
- Synthetic data augmentation (varied lighting/backgrounds).
- YOLOv8 fine-tuning via PyTorch and CoreML conversion for iOS.
- Mobile Deployment:
- Optimized model for edge performance (CoreML).
- SwiftUI frontend with camera integration.
Results & Impact
- Efficiency: 90% faster data generation vs. manual methods.
- Accuracy: 99.84% detection under real-world conditions.
- Deployment: iPhone app enables field authentication without cloud dependency.
Future Work
- Expand the training dataset to cover more real-world cases.
- Port to Android for broader adoption.