You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Runforge_Core-7b

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Linear merge method using mistralai/Mistral-7B-v0.3 as a base.

Models Merged

The following models were included in the merge:

  • dreamgen/WizardLM-2-7B
  • uukuguy/speechless-code-mistral-7b-v2.0

Configuration

The following YAML configuration was used to produce this model:

# Clean 3-way dense merge rebuild for runeforge_core-7b
models:
  - model: mistralai/Mistral-7B-v0.3
    parameters:
      weight: 0.4
  - model: dreamgen/WizardLM-2-7B
    parameters:
      weight: 0.3
  - model: uukuguy/speechless-code-mistral-7b-v2.0
    parameters:
      weight: 0.3
merge_method: linear
base_model: mistralai/Mistral-7B-v0.3
dtype: float16
out_dtype: float16

Evaluation

Setup

  • Date: 2026-03-14
  • Runtime: local GPU inference in WSL
  • Loader: Transformers/Unsloth with 4-bit quantization (load_in_4bit)
  • Benchmarks:
    • ARC-Challenge (multiple-choice)
    • HellaSwag (multiple-choice)
    • Winogrande XL (multiple-choice)
    • TruthfulQA MC1 (multiple-choice)
  • Metric: Accuracy per benchmark and macro average across the four tasks

Primary Comparison (200 samples per benchmark)

Model ARC HellaSwag Winogrande TruthfulQA MC1 Macro Avg
runeforge_core-7b (this model) 0.7650 0.7050 0.6000 0.5800 0.6625
mistral-7b baseline 0.7000 0.6000 0.4600 0.5900 0.5875

Interpretation: runeforge_core-7b outperformed the local Mistral baseline by +0.0750 macro accuracy on this evaluation run.

Expanded Comparison (30 samples per benchmark)

Model ARC HellaSwag Winogrande TruthfulQA MC1 Macro Avg
runeforge_core-7b (this model) 0.8000 0.7333 0.5000 0.4333 0.6167
mistral-7b baseline 0.7000 0.6000 0.4667 0.5000 0.5667
speechless-code-mistral-7b-v2.0 0.6000 0.3000 0.5333 0.6000 0.5083
dreamgen/WizardLM-2-7B 0.2000 0.2333 0.6667 0.7667 0.4667
runeforge_mk1_merged_from_7922 0.0000 0.0000 0.0000 0.0000 0.0000

Note: the expanded table uses a smaller sample size and is more variance-prone; use the 200-sample comparison as the primary signal.

Coding Sanity Check (Executable)

A separate executable coding sanity check (5 unit-tested tasks) was also run:

Model Passes Total Pass Rate
runeforge_core-7b (this model) 5 5 1.00
runeforge_mk1_merged_from_7922 0 5 0.00

Reproducibility Files

Repository-relative references (from this model folder):

  • ../Making_Runeforge/evaluate_general_models.py
  • ../Making_Runeforge/evaluate_coding_exec.py
  • ../Making_Runeforge/eval_general_runeforge_core_200.json
  • ../Making_Runeforge/eval_general_mistral_base_200.json
  • ../Making_Runeforge/eval_general_leaderboard.json
  • ../Making_Runeforge/runeforge_coding_exec_eval.json

Intended Use

  • General-purpose assistant and instruction-following use cases.
  • Strong performance on local multiple-choice reasoning benchmarks relative to the local Mistral baseline used in this project.
  • Suitable as a base for additional task-specific fine-tuning where broad instruction quality is desired.

Limitations

  • Reported metrics are from local, sampled benchmark runs (not full official leaderboard submissions).
  • Quantized inference (load_in_4bit) was used for evaluation; scores may shift under different precision/runtime setups.
  • Expanded 5-model comparison used 30 samples per benchmark and should be treated as directional.
  • A separate merged artifact (runeforge_mk1_merged_from_7922) showed severe degradation (near-zero on sampled general benchmarks and 0/5 on coding executable sanity checks).

Evaluation Notes

  • The 200-sample comparison is the primary result set for this card.
  • The 30-sample expanded table is included for breadth across additional local peer models.
  • All benchmark scripts and JSON outputs are listed above for reproducibility.
Downloads last month
21
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BossCrafts/Runeforge_Core_mk1-7b

Paper for BossCrafts/Runeforge_Core_mk1-7b