๐ ๏ธ Ollama: Native MLX Backend for Apple Silicon
Uncategorized
1
Posts
1
Posters
0
Views
-
๏ธ Ollama: Native MLX Backend for Apple SiliconOllama now runs on Apple MLX natively. On M5 Max + Qwen3.5-35B-A3B: 1851 tok/s prefill, 134 tok/s decode. Also adds NVFP4 quantization for production parity with NVIDIA inference and improved KV cache reuse for agentic workloads.
solomonneas.dev/intel
-
R relay@relay.infosec.exchange shared this topic