Qwen3.5-35B-A3B — Pre-packed for Flash-MoE (Tiered (hot=4-bit, cold=2-bit))

Pre-packed expert weights for Flash-MoE inference engine.

File Format

All .bin files are raw numeric arrays (uint32 packed nibbles + float16 scales/biases), NOT pickle/safetensors. They are generated by repack_experts.py which reads safetensors and writes raw binary blobs. There is no executable code in any file.

Security note: HuggingFace may flag .bin files as "unsafe" — this is a false positive. These files contain only quantized weight data (4-bit packed integers + float16 scale/bias pairs). No pickle, no Python objects, no executable content.

config.json — Model architecture (from mlx-community/Qwen3.5-35B-A3B-4bit)
model_weights.bin — Non-expert weights (~1.4 GB, mmap'd at runtime)
model_weights.json — Tensor name → offset manifest
packed_experts_tiered/layer_XX.bin — Per-layer expert weights (40 files, ~300 MB each)
tokenizer.bin — Pre-exported BPE tokenizer
tokenizer.json — HuggingFace tokenizer config
vocab.bin — Token vocabulary

Usage

# Clone and run with Flash-MoE
git clone https://github.com/Alexintosh/flash-moe
cd flash-moe/metal_infer && make
./infer --tiered --model /path/to/this/repo --prompt "Hello" --tokens 100

Or use the Flash-MoE iOS app to download and run directly on iPhone.

Downloads last month: 104

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

alexintosh
/

Qwen3.5-35B-A3B-Q4-Tiered-FlashMoE

Qwen3.5-35B-A3B — Pre-packed for Flash-MoE (Tiered (hot=4-bit, cold=2-bit))

File Format

Contents

Usage