AMD Ryzen AI Max+ 395 Strix Halo
Quantized models benchmarked with Windows ROCm llama.cpp builds from Lemonade using recommended parameters. OpenCode testing done in WSL.
Text Generation • 1B • Updated • 3.99k • 35Note 145 t/s @ Q8_0. Surprisingly capable in chat. Not usable in OpenCode.
ggml-org/gpt-oss-20b-GGUF
21B • Updated • 89.3k • 139Note 60 t/s @ MXFP4. OpenCode tools work. Prefer 120B.
mradermacher/Nanbeige4.1-3B-GGUF
4B • Updated • 11k • 36Note 51 t/s @ Q8_0. Thinks for minutes. Not usable in OpenCode.
unsloth/GLM-4.7-Flash-GGUF
Text Generation • 30B • Updated • 317k • 575Note 45 t/s @ Q8_0. OpenCode tool calling works great. Made a nice looking 400-line OpenMeteo weather app with typeahead search. Required manual TypeScript error fixes to run. Note that the smaller REAP model wasn't faster.
bartowski/moonshotai_Kimi-Linear-48B-A3B-Instruct-GGUF
Text Generation • 49B • Updated • 4.91k • 19Note 45 t/s @ Q8_0. Excellent OpenCode tool calling including todo list and ask question. Made a 600-line OpenMeteo weather app with no errors. Note that it did everything the frontend-design skill said NOT to do, resulting in a comically bad looking app. Most usable model on this list.
ggml-org/gpt-oss-120b-GGUF
117B • Updated • 353k • 67Note 42 t/s @ MXFP4. Good OpenCode tool calling, writes working TypeScript, but even the frontend-design skill can't get it to make attractive websites. Feels like GPT-4o, which is nice for nostalgia.
unsloth/Qwen3-Coder-Next-GGUF
Text Generation • 80B • Updated • 368k • 516Note 32 t/s @ Q8_0. Was not able to build a working OpenMeteo weather app. Struggled with the edit tool attempting to fix errors. Was not able to properly trace errors in the code.
Intel/MiniMax-M2-REAP-172B-A10B-gguf-q2ks-mixed-AutoRound
173B • Updated • 110 • 10Note 26 t/s @ Q2_K_S.
unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
Image-Text-to-Text • 108B • Updated • 43.4k • 145Note 15 t/s @ UD-IQ3_XXS.
unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
24B • Updated • 41.1k • 120Note 9 t/s @ Q8_0. All dense models are slow on Strix Halo. Speculative decoding (ngram-mod) works very well when it kicks in.