MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
• 2402.15627
• Published • 36
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper
• 2403.00522
• Published • 46
Stealing Part of a Production Language Model
Paper
• 2403.06634
• Published • 91
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
• 2403.06504
• Published • 56
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
Acceleration for Large Vision-Language Models
Paper
• 2403.06764
• Published • 27
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
• 2404.16710
• Published • 80
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
• 2406.18629
• Published • 42
Simulating Classroom Education with LLM-Empowered Agents
Paper
• 2406.19226
• Published • 32
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented
Generation
Paper
• 2406.19215
• Published • 32
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
• 2406.19227
• Published • 25
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for
Memory-Efficient Embeddings
Paper
• 2406.19223
• Published • 11
Understand What LLM Needs: Dual Preference Alignment for
Retrieval-Augmented Generation
Paper
• 2406.18676
• Published • 6
Direct Preference Knowledge Distillation for Large Language Models
Paper
• 2406.19774
• Published • 22
We-Math: Does Your Large Multimodal Model Achieve Human-like
Mathematical Reasoning?
Paper
• 2407.01284
• Published • 81
Unveiling Encoder-Free Vision-Language Models
Paper
• 2406.11832
• Published • 54
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for
LLM Agents
Paper
• 2407.04363
• Published • 34
Human-like Episodic Memory for Infinite Context LLMs
Paper
• 2407.09450
• Published • 62
GAVEL: Generating Games Via Evolution and Language Models
Paper
• 2407.09388
• Published • 17
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Paper
• 2407.10969
• Published • 23
Better Alignment with Instruction Back-and-Forth Translation
Paper
• 2408.04614
• Published • 15
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
Large Language Models
Paper
• 2408.04840
• Published • 33