SciDataCopilot: An Agentic Data Preparation Framework for AGI-driven Scientific Discovery Paper • 2602.09132 • Published Feb 9 • 10
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 4 days ago • 110
Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training Paper • 2603.07223 • Published 23 days ago • 13
Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets Paper • 2601.09733 • Published Dec 30, 2025 • 9
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch Paper • 2601.13606 • Published Jan 20 • 11
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility Paper • 2601.17027 • Published Jan 17 • 42
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published Jan 29 • 62
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch Paper • 2601.13606 • Published Jan 20 • 11
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published Dec 16, 2025 • 47
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published Dec 16, 2025 • 47
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations Paper • 2310.07276 • Published Oct 11, 2023 • 5
MolXPT: Wrapping Molecules with Text for Generative Pre-training Paper • 2305.10688 • Published May 18, 2023 • 1
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning Paper • 2402.17810 • Published Feb 27, 2024 • 1
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey Paper • 2403.01528 • Published Mar 3, 2024 • 1
SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction Paper • 2206.09818 • Published Jun 20, 2022
3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization Paper • 2406.05797 • Published Jun 9, 2024 • 3
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining Paper • 2410.08102 • Published Oct 10, 2024 • 21