MLX Studio

MLX Studio — the only app that natively supports JANG models with reasoning


JANG

Mistral Small 4 (119B-A6B) — JANG_2L (2.14-bit) — Reasoning + VLM

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

GitHub  PyPI  Website  X/Twitter

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.


First Mistral Small 4 (119B) on Apple Silicon. MLA attention + 128 MoE experts + Pixtral VLM. 5x faster prefill than MLX Community 4-bit.

Reasoning mode: Set reasoning_effort to "high" for step-by-step reasoning with [THINK]...[/THINK] tags.


Speed Comparison — JANG vs MLX Community

Model Size Gen tok/s Prefill tok/s RAM Fits On
JANG_2L (this model) 30 GB 82 216 40 GB 48 GB Macs
JANG_4M 57 GB 80 202 68 GB 96+ GB Macs
JANG_6M 84 GB 74 160 95 GB 128+ GB Macs
MLX Community 4-bit 63 GB 84 43 68 GB 96+ GB Macs
  • 5x faster prefill (216 vs 43 tok/s)
  • Half the size (30 GB vs 63 GB) at comparable generation speed
  • Benchmarked on M3 Ultra 256 GB with bfloat16 compute

Key Features

  • 82 tok/s generation on M3 Ultra — matches MLX 4-bit at half the size
  • 30 GB on disk, 40 GB peak RAM — fits 48 GB Macs (M4 Pro, M2/M3 Max)
  • Vision (VLM): Pixtral encoder, 1540px max, processes images
  • Reasoning mode: [THINK]...[/THINK] step-by-step reasoning
  • Code generation: Complete functions with docstrings and optimized logic
  • Math: Step-by-step calculations with distributive property
  • 119B total / 6B active per token — MLA attention + 128 MoE experts

Architecture

JANG_2L Bit Allocation

Tensor Type Bits Purpose
Attention (q/k/v/o projections) 8 Critical — preserves MLA precision
Embeddings, lm_head 8 Critical — token representation
MoE gate (router) 16 Float16 passthrough — routing precision
Shared experts 6 Important — always active
Routed experts (128) 2 Compressed — many experts = redundancy
Norms, biases full Float — tiny tensors, keep exact

Results: 94.0% MMLU at 2-bit (200 Questions, Reasoning Mode)

Subject Score
Abstract Algebra 13/20 (65%)
Anatomy 20/20 (100%)
Astronomy 20/20 (100%)
College CS 20/20 (100%)
College Physics 19/20 (95%)
HS Biology 19/20 (95%)
HS Chemistry 20/20 (100%)
HS Mathematics 18/20 (90%)
Logical Fallacies 19/20 (95%)
World Religions 20/20 (100%)
Total 188/200 (94.0%)

Five perfect 100% subjects. 119B intelligence in 30 GB at 82 tok/s.

Requirements

  • MLX Studio for native JANG support with reasoning
  • Or: - Apple Silicon Mac with 48+ GB unified memory

Install

\

Created by Jinho Jangjangq.ai@dealignai

Downloads last month
491
Safetensors
Model size
12B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_2L

Quantized
(23)
this model