JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_2L

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

First Mistral Small 4 (119B) on Apple Silicon. MLA attention + 128 MoE experts + Pixtral VLM. 5x faster prefill than MLX Community 4-bit.

Reasoning mode: Set reasoning_effort to "high" for step-by-step reasoning with [THINK]...[/THINK] tags.

Speed Comparison — JANG vs MLX Community

Model	Size	Gen tok/s	Prefill tok/s	RAM	Fits On
JANG_2L (this model)	30 GB	82	216	40 GB	48 GB Macs
JANG_4M	57 GB	80	202	68 GB	96+ GB Macs
JANG_6M	84 GB	74	160	95 GB	128+ GB Macs
MLX Community 4-bit	63 GB	84	43	68 GB	96+ GB Macs

Tensor Type	Bits	Purpose
Attention (q/k/v/o projections)	8	Critical — preserves MLA precision
Embeddings, lm_head	8	Critical — token representation
MoE gate (router)	16	Float16 passthrough — routing precision
Shared experts	6	Important — always active
Routed experts (128)	2	Compressed — many experts = redundancy
Norms, biases	full	Float — tiny tensors, keep exact

Five perfect 100% subjects. 119B intelligence in 30 GB at 82 tok/s.

Safetensors

Model size

12B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Base model

Quantized

(23)

this model