You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Runforge_Core-7b

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Linear merge method using mistralai/Mistral-7B-v0.3 as a base.

Models Merged

The following models were included in the merge:

dreamgen/WizardLM-2-7B
uukuguy/speechless-code-mistral-7b-v2.0

Configuration

The following YAML configuration was used to produce this model:

# Clean 3-way dense merge rebuild for runeforge_core-7b
models:
  - model: mistralai/Mistral-7B-v0.3
    parameters:
      weight: 0.4
  - model: dreamgen/WizardLM-2-7B
    parameters:
      weight: 0.3
  - model: uukuguy/speechless-code-mistral-7b-v2.0
    parameters:
      weight: 0.3
merge_method: linear
base_model: mistralai/Mistral-7B-v0.3
dtype: float16
out_dtype: float16

Evaluation

Setup

Date: 2026-03-14
Runtime: local GPU inference in WSL
Loader: Transformers/Unsloth with 4-bit quantization (load_in_4bit)
Benchmarks:
- ARC-Challenge (multiple-choice)
- HellaSwag (multiple-choice)
- Winogrande XL (multiple-choice)
- TruthfulQA MC1 (multiple-choice)
Metric: Accuracy per benchmark and macro average across the four tasks

Primary Comparison (200 samples per benchmark)

Model	ARC	HellaSwag	Winogrande	TruthfulQA MC1	Macro Avg
runeforge_core-7b (this model)	0.7650	0.7050	0.6000	0.5800	0.6625
mistral-7b baseline	0.7000	0.6000	0.4600	0.5900	0.5875

Interpretation: runeforge_core-7b outperformed the local Mistral baseline by +0.0750 macro accuracy on this evaluation run.

Expanded Comparison (30 samples per benchmark)

Model	ARC	HellaSwag	Winogrande	TruthfulQA MC1	Macro Avg
runeforge_core-7b (this model)	0.8000	0.7333	0.5000	0.4333	0.6167
mistral-7b baseline	0.7000	0.6000	0.4667	0.5000	0.5667
speechless-code-mistral-7b-v2.0	0.6000	0.3000	0.5333	0.6000	0.5083
dreamgen/WizardLM-2-7B	0.2000	0.2333	0.6667	0.7667	0.4667
runeforge_mk1_merged_from_7922	0.0000	0.0000	0.0000	0.0000	0.0000

Note: the expanded table uses a smaller sample size and is more variance-prone; use the 200-sample comparison as the primary signal.

Coding Sanity Check (Executable)

A separate executable coding sanity check (5 unit-tested tasks) was also run:

Model	Passes	Total	Pass Rate
runeforge_core-7b (this model)	5	5	1.00
runeforge_mk1_merged_from_7922	0	5	0.00

Reproducibility Files

Repository-relative references (from this model folder):

../Making_Runeforge/evaluate_general_models.py
../Making_Runeforge/evaluate_coding_exec.py
../Making_Runeforge/eval_general_runeforge_core_200.json
../Making_Runeforge/eval_general_mistral_base_200.json
../Making_Runeforge/eval_general_leaderboard.json
../Making_Runeforge/runeforge_coding_exec_eval.json

Intended Use

General-purpose assistant and instruction-following use cases.
Strong performance on local multiple-choice reasoning benchmarks relative to the local Mistral baseline used in this project.
Suitable as a base for additional task-specific fine-tuning where broad instruction quality is desired.

Limitations

Reported metrics are from local, sampled benchmark runs (not full official leaderboard submissions).
Quantized inference (load_in_4bit) was used for evaluation; scores may shift under different precision/runtime setups.
Expanded 5-model comparison used 30 samples per benchmark and should be treated as directional.
A separate merged artifact (runeforge_mk1_merged_from_7922) showed severe degradation (near-zero on sampled general benchmarks and 0/5 on coding executable sanity checks).

Evaluation Notes

The 200-sample comparison is the primary result set for this card.
The 30-sample expanded table is included for breadth across additional local peer models.
All benchmark scripts and JSON outputs are listed above for reproducibility.

Downloads last month: 21

Safetensors

Model size

7B params

Tensor type

F16

Model tree for BossCrafts/Runeforge_Core_mk1-7b

dreamgen/WizardLM-2-7B

mistralai/Mistral-7B-v0.3

uukuguy/speechless-code-mistral-7b-v2.0

Merge model

this model

Paper for BossCrafts/Runeforge_Core_mk1-7b

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Paper • 2203.05482 • Published Mar 10, 2022 • 8