-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated a model 3 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 published a model 3 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 updated a model 3 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4Organizations
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 279 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
LLM4Math
-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 279 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
models 227
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4
8B • Updated • 20
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4
Text Generation • 8B • Updated • 136
shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4
Text Generation • 8B • Updated • 146
shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4
Text Generation • 8B • Updated • 282
shuoxing/llama3-8b-full-pretrain-wash-c4-3-3m-bs4
Text Generation • 8B • Updated • 169
shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4
Text Generation • 8B • Updated • 266
shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4
Text Generation • 8B • Updated • 173
shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64
Text Generation • 8B • Updated • 183
shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64
Text Generation • 8B • Updated • 199
shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
Text Generation • 8B • Updated • 215
datasets 7
shuoxing/yt_ugc_public
Updated • 1.37k
shuoxing/AutoTrust
Updated • 3
shuoxing/KoNViD_1k_videos
Viewer • Updated • 1.2k • 58
shuoxing/Tweet_demo
Viewer • Updated • 100 • 12
shuoxing/MapBench_VQA
Viewer • Updated • 96 • 25 • 1
shuoxing/MapBench
Viewer • Updated • 97 • 7
shuoxing/tweet-scholar
Viewer • Updated • 95 • 6