LLM - a GEONTT Collection

GEONTT 's Collections

LLM

updated Aug 26, 2024

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 36
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 56
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11, 2024 • 27
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25, 2024 • 80
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published Jun 26, 2024 • 42
Simulating Classroom Education with LLM-Empowered Agents

Paper • 2406.19226 • Published Jun 27, 2024 • 32
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

Paper • 2406.19215 • Published Jun 27, 2024 • 32
Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published Jun 27, 2024 • 25
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

Paper • 2406.19223 • Published Jun 27, 2024 • 11
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

Paper • 2406.18676 • Published Jun 26, 2024 • 6
Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 22
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 81
Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 54
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Paper • 2407.04363 • Published Jul 5, 2024 • 34
Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62
GAVEL: Generating Games Via Evolution and Language Models

Paper • 2407.09388 • Published Jul 12, 2024 • 17
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Paper • 2407.10969 • Published Jul 15, 2024 • 23
Better Alignment with Instruction Back-and-Forth Translation

Paper • 2408.04614 • Published Aug 8, 2024 • 15
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9, 2024 • 33