2 21 2

Renjie

Renjie-Ranger

https://renjie-ranger.github.io/

AI & ML interests

LLM Post-Training

Recent Activity

updated a model about 3 hours ago

Renjie-Ranger/FCP-plus-Bootstrap_paper_table_1_version

published a model about 3 hours ago

Renjie-Ranger/FCP-plus-Bootstrap_paper_table_1_version

upvoted a paper 11 days ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

View all activity

Organizations

None yet

updated a model about 3 hours ago

Renjie-Ranger/FCP-plus-Bootstrap_paper_table_1_version

8B • Updated about 3 hours ago

published a model about 3 hours ago

Renjie-Ranger/FCP-plus-Bootstrap_paper_table_1_version

8B • Updated about 3 hours ago

upvoted a paper 11 days ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published 13 days ago • 181

upvoted a paper 2 months ago

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 200

authored 2 papers 3 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

Paper • 2404.07584 • Published Apr 11, 2024

updated a collection 3 months ago

Feedback_Conditional_Policy

Collection

Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1

updated a model 3 months ago

Renjie-Ranger/RFT-GRPO_Qwen2.5-7B

8B • Updated Jan 5

published a model 3 months ago

Renjie-Ranger/RFT-GRPO_Qwen2.5-7B

8B • Updated Jan 5

updated a model 3 months ago

Renjie-Ranger/Base-GRPO_Qwen2.5-7B

8B • Updated Jan 5

published a model 3 months ago

Renjie-Ranger/Base-GRPO_Qwen2.5-7B

8B • Updated Jan 5

updated a model 3 months ago

Renjie-Ranger/FCP-Bootstrap_Qwen2.5-7B

8B • Updated Jan 5 • 2

published a model 3 months ago

Renjie-Ranger/FCP-Bootstrap_Qwen2.5-7B

8B • Updated Jan 5 • 2

upvoted a collection 3 months ago

Feedback_Conditional_Policy

Collection

Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1

updated a collection 3 months ago

Feedback_Conditional_Policy

Collection

Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638) • 7 items • Updated Jan 5 • 1

Renjie

AI & ML interests

Recent Activity

Organizations

Renjie-Ranger's activity