Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

ONNX Runtime's CoreMLExecutionProvider silently converts models to FP16 precision when using the default NeuralNetwork format, causing prediction differences compared to PyTorch or CPU execution. The issue stems from CoreML's older NeuralNetwork format lacking explicit typing for intermediate layers, allowing Apple GPUs to default to FP16 for performance. The solution is to explicitly set the ModelFormat to MLProgram when creating an InferenceSession, which uses typed intermediate representations and preserves FP32 precision. The article traces the root cause through detailed benchmarking, explains floating-point precision differences, and explores the evolution from DAG-based model representations to modern intermediate representations in ML compilers.

ONNX Runtime & CoreML May Silently Convert Your Model to FP16 (And How to Stop It)

Uncovering an Issue in ONNX Runtime - Benchmarking the EyesOff Model

Finding the Source of the CPU vs MPS Difference - With an MLP

The Fix - NeuralNetwork vs MLProgram CoreML Format

Why MLProgram Format Worked and Neural Network Didnt?

But Why Does MLProgram Have Typed Layers?