view article Article Benchmark Smarter: Tailor Your Model Evaluation Suite with EvalScope Jan 22 β’ 7
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 38 items β’ Updated Mar 2 β’ 360
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated about 5 hours ago β’ 156