Tokie
community
AI & ML interests
None defined yet.
Recent Activity
Organization Card
tokiers
Pre-built .tkz tokenizer files for tokie — the fast, correct Rust tokenizer.
What is tokie?
tokie is a Rust tokenizer library that's a drop-in replacement for HuggingFace tokenizers — 50x faster tokenization, 10x smaller model files, 100% accurate.
It supports BPE (GPT-2, tiktoken, SentencePiece), WordPiece (BERT), and Unigram (T5/XLM-R) encoders, with a custom .tkz binary format that loads in ~5ms.
What's on this org?
This organization hosts pre-built .tkz tokenizer files for popular models. Each repo contains the original model's tokenizer converted to tokie's compact binary format.
Using a model from this org
use tokie::Tokenizer;
// Loads tokenizer.tkz from this org automatically
let tokenizer = Tokenizer::from_pretrained("tokiers/ms-marco-MiniLM-L-6-v2")?;
let tokens = tokenizer.encode("Hello, world!", true);
from_pretrained() tries .tkz first, then falls back to tokenizer.json — so these repos are fully compatible with the standard HuggingFace loading flow.
Links
- GitHub: chonkie-inc/tokie
- crates.io: tokie
- Built by: chonkie-inc
models 94
Updated
tokiers/potion-science-32M
Updated
tokiers/potion-retrieval-32M
Updated
tokiers/potion-multilingual-128M
Updated
tokiers/potion-base-8M
Updated
tokiers/potion-base-4M
Updated
tokiers/potion-base-32M
Updated
tokiers/potion-base-2M
Updated
tokiers/potion-8m-edu-classifier
Updated
tokiers/M2V_multilingual_output
Updated
datasets 0
None public yet