CRITICAL FIX (2026-03-19): Fixed eos_token_id โ€” previous versions caused infinite thinking loops. You MUST re-download this model if you downloaded before today.

Update (2026-03-18): Models have been updated to v2.1.0 with VLM support, proper tokenizer, and fixed configs. If you downloaded before this date, please re-download for full MLX Studio compatibility.

MLX Studio

MLX Studio App

MLX Studio โ€” the only app that natively supports JANG models


Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or pip install "jang[mlx]". Ask your favorite app's creators to add JANG support!


JANG

Qwen3.5-9B โ€” JANG_4S (4.34-bit) โ€” VLM

JANG โ€” Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

GitHub  PyPI  Website  X/Twitter

JANG is fully open-source. Quantization engine, research, and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

Results (200-question MMLU)

Model MMLU Size Speed
JANG_4S (4.34-bit) 73.0% 6.0 GB โ€”
MLX 4-bit 72.5% 4.7 GB โ€”
MLX 3-bit 64.0% 3.7 GB โ€”
MLX 2-bit 22.0% 2.6 GB โ€”

JANG_4S beats MLX 4-bit on 9B โ€” attention at 6-bit preserves quality.

Per-Subject Scores

Subject JANG_4S MLX_4bit MLX_3bit MLX_2bit
Abstract Algebra 9/20 11/20 8/20 4/20
Anatomy 16/20 15/20 13/20 6/20
Astronomy 20/20 20/20 16/20 5/20
College CS 14/20 13/20 10/20 7/20
College Physics 13/20 13/20 12/20 6/20
HS Biology 18/20 18/20 19/20 4/20
HS Chemistry 15/20 14/20 15/20 4/20
HS Mathematics 8/20 9/20 5/20 2/20
Logical Fallacies 17/20 16/20 16/20 3/20
World Religions 16/20 16/20 14/20 3/20
Total (/200) 146 145 128 44

Specs

Metric Value
Source Qwen3.5-9B
Profile JANG_4S (CRITICAL=6, IMPORTANT=4, COMPRESS=4)
Average bits 4.34
VLM Yes (333 vision tensors)
Speed ~70 tok/s
Format v2 (MLX-native, instant load)

Install

pip install "jang[mlx]"

For Vision-Language models:

pip install "jang[vlm]"

Quick Start

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-9B-JANG_4S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

VLM Inference

from jang_tools.loader import load_jang_vlm_model
from mlx_vlm import generate

model, processor = load_jang_vlm_model("JANGQ-AI/Qwen3.5-9B-JANG_4S")

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "Describe this image."}
    ]}], add_generation_prompt=True, tokenize=False, enable_thinking=False)

result = generate(model, processor, prompt, ["photo.jpg"], max_tokens=200)
print(result.text)

Links



ํ•œ๊ตญ์–ด

Qwen3.5-9B โ€” JANG 4S

JANG์€ Apple Silicon์„ ์œ„ํ•œ ํ˜ผํ•ฉ์ •๋ฐ€๋„ ์–‘์žํ™” ํฌ๋งท์ž…๋‹ˆ๋‹ค. MLX๋ฅผ ์œ„ํ•œ GGUF์™€ ๊ฐ™์€ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ MMLU ํฌ๊ธฐ
JANG_4S 73.0% 6.0 GB
MLX 4-bit 72.5% 4.7 GB

์„ค์น˜

pip install "jang[mlx]"

ํ˜ธํ™˜์„ฑ

ํ˜„์žฌ **MLX Studio**๋งŒ JANG ํฌ๋งท์„ ๊ธฐ๋ณธ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. LM Studio, Ollama ๋“ฑ์€ ์•„์ง ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

GitHub ยท HuggingFace ยท MLX Studio ยท PyPI


์žฅ์ง„ํ˜ธ ์ œ์ž‘ ยท Created by Jinho Jang โ€” jangq.ai ยท @dealignai

Downloads last month
430
Safetensors
Model size
2B params
Tensor type
U32
ยท
F16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for JANGQ-AI/Qwen3.5-9B-JANG_4S

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(115)
this model