AI & ML interests

None defined yet.

Recent Activity

muhammad0-0hreden  updated a Space 4 days ago
Misraj/README
muhammad0-0hreden  updated a model 24 days ago
Misraj/Baseer__Nakba
muhammad0-0hreden  published a Space 26 days ago
Misraj/README
View all activity

Organization Card

مِسراج — Misraj AI

Built on Trust. Measured by Impact.
The next-generation Arabic AI lab — building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.


🧭 About Us

Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.

We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems — all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.

From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.

📊 15+ research papers · 35 billion open Arabic data tokens · Honored by AI Pioneers


🏢 Areas of Expertise

Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:

  • 🏥 Healthcare Technology — Clinical documentation and Arabic medical NLP
  • 🏦 Financial Technology — Document intelligence for banking and finance
  • ⚖️ Legal Technology — Contract analysis and legal document processing
  • 🎓 Educational Technology — Arabic learning and knowledge systems
  • 🏛️ Administrative Technology — Government and enterprise document automation

📈 Open Benchmarks & SOTA Results

We develop rigorous, expert-verified benchmarks to establish clear performance standards for Arabic AI. Our models consistently lead these benchmarks against both open-source and commercial competitors.

Benchmark Focus Key Performance (SOTA)
Misraj-DocOCR Arabic Document OCR Baseer achieves 0.25 WER, outperforming Azure AI and Gemini 2.5 Pro.
KITAB-Reviewed PDF-to-Markdown Baseer leads in structure with a 56 TEDS and 68.13 MARS score.
Tarjama-25 Bi-directional Translation Mutarjim (1.5B) outperforms models 20x its size (including GPT-4o mini) in EN→AR.
SadeedDiac-25 Arabic Diacritization Sadeed achieves a competitive 1.2% Diacritic Error Rate (DER).

📦 Open Datasets

Our large-scale datasets provide the foundational fuel for high-performance Arabic model training.

Dataset Description Scale
msdd Misraj Structured Document Dataset 26.4M rows
mudd Misraj Unstructured Document Dataset 4.76M rows
Arabic-Image-Captioning Multimodal Arabic captioning pairs 100M pairs
Sadeed Tashkeela Cleaned & expert-filtered diacritization corpus 1.05M samples

📊 35+ billion open Arabic data tokens released and growing.


📬 Connect With Us

Platform Link
🌐 Misraj AI misraj.ai/en
🌐 Misraj Technology misraj.sa/en
🔵 Baseer OCR baseerocr.com
🤗 Hugging Face huggingface.co/Misraj
💼 LinkedIn linkedin.com/company/aimisraj
🐦 X / Twitter @aimisraj
💻 GitHub github.com/misraj-ai
📸 Instagram @misraj__ai