Qwen3.5-35B-A3B β Pre-packed for Flash-MoE (Tiered (hot=4-bit, cold=2-bit))
Pre-packed expert weights for Flash-MoE inference engine.
File Format
All .bin files are raw numeric arrays (uint32 packed nibbles + float16 scales/biases), NOT pickle/safetensors. They are generated by repack_experts.py which reads safetensors and writes raw binary blobs. There is no executable code in any file.
Security note: HuggingFace may flag .bin files as "unsafe" β this is a false positive. These files contain only quantized weight data (4-bit packed integers + float16 scale/bias pairs). No pickle, no Python objects, no executable content.
Contents
config.jsonβ Model architecture (from mlx-community/Qwen3.5-35B-A3B-4bit)model_weights.binβ Non-expert weights (~1.4 GB, mmap'd at runtime)model_weights.jsonβ Tensor name β offset manifestpacked_experts_tiered/layer_XX.binβ Per-layer expert weights (40 files, ~300 MB each)tokenizer.binβ Pre-exported BPE tokenizertokenizer.jsonβ HuggingFace tokenizer configvocab.binβ Token vocabulary
Usage
# Clone and run with Flash-MoE
git clone https://github.com/Alexintosh/flash-moe
cd flash-moe/metal_infer && make
./infer --tiered --model /path/to/this/repo --prompt "Hello" --tokens 100
Or use the Flash-MoE iOS app to download and run directly on iPhone.
- Downloads last month
- 104
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support