Qwen3.5-35B-A3B β€” Pre-packed for Flash-MoE (Tiered (hot=4-bit, cold=2-bit))

Pre-packed expert weights for Flash-MoE inference engine.

File Format

All .bin files are raw numeric arrays (uint32 packed nibbles + float16 scales/biases), NOT pickle/safetensors. They are generated by repack_experts.py which reads safetensors and writes raw binary blobs. There is no executable code in any file.

Security note: HuggingFace may flag .bin files as "unsafe" β€” this is a false positive. These files contain only quantized weight data (4-bit packed integers + float16 scale/bias pairs). No pickle, no Python objects, no executable content.

Contents

  • config.json β€” Model architecture (from mlx-community/Qwen3.5-35B-A3B-4bit)
  • model_weights.bin β€” Non-expert weights (~1.4 GB, mmap'd at runtime)
  • model_weights.json β€” Tensor name β†’ offset manifest
  • packed_experts_tiered/layer_XX.bin β€” Per-layer expert weights (40 files, ~300 MB each)
  • tokenizer.bin β€” Pre-exported BPE tokenizer
  • tokenizer.json β€” HuggingFace tokenizer config
  • vocab.bin β€” Token vocabulary

Usage

# Clone and run with Flash-MoE
git clone https://github.com/Alexintosh/flash-moe
cd flash-moe/metal_infer && make
./infer --tiered --model /path/to/this/repo --prompt "Hello" --tokens 100

Or use the Flash-MoE iOS app to download and run directly on iPhone.

Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support