ONNX Runtime's CoreMLExecutionProvider silently converts models to FP16 precision when using the default NeuralNetwork format, causing prediction differences compared to PyTorch or CPU execution. The issue stems from CoreML's older NeuralNetwork format lacking explicit typing for intermediate layers, allowing Apple GPUs to

23m read timeFrom ym2132.github.io
Post cover image
Table of contents
Uncovering an Issue in ONNX Runtime - Benchmarking the EyesOff ModelWhy am I Using ONNX and ONNX RunTime?Finding the Source of the CPU vs MPS Difference - With an MLPWhere Does the Model Switch to FP16?The Fix - NeuralNetwork vs MLProgram CoreML FormatWhy MLProgram Format Worked and Neural Network Didnt?But Why Does MLProgram Have Typed Layers?Takeaways

Sort: