Ollama has released an update that leverages Apple’s MLX framework, significantly enhancing processing speeds on Macs equipped with Apple silicon. Users can expect prefill speeds to be 1.6 times faster and decode speeds to nearly double, particularly benefiting Macs with M5-series chips and GPU Neural Accelerators.
The update’s improvements are vital for macOS users relying on AI tools such as personal assistants and coding agents. The new version promises better responsiveness during prolonged use, aided by enhanced memory management.
The preview release, designated as Ollama 0.19, requires more than 32GB of unified memory to operate effectively. Currently, the update primarily supports Alibaba’s Qwen3.5 model, with plans to include additional AI models in the future.




