MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 13 days ago • 181
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs Paper • 2404.07584 • Published Apr 11, 2024
Feedback_Conditional_Policy Collection Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1
Feedback_Conditional_Policy Collection Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1
Feedback_Conditional_Policy Collection Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1