LLM - a paisleypark Collection

paisleypark 's Collections

LLM

updated Jun 11, 2024

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 3
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 8
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Exploring Format Consistency for Instruction Tuning

Paper • 2307.15504 • Published Jul 28, 2023 • 8
Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26, 2024 • 22
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26, 2024 • 20
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 20
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 20
Reasons to Reject? Aligning Language Models with Judgments

Paper • 2312.14591 • Published Dec 22, 2023 • 18
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 8
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 13
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Controlled Decoding from Language Models

Paper • 2310.17022 • Published Oct 25, 2023 • 14
CapsFusion: Rethinking Image-Text Data at Scale

Paper • 2310.20550 • Published Oct 31, 2023 • 27
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 14
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Paper • 2310.15308 • Published Oct 23, 2023 • 23
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning

Paper • 2310.12274 • Published Oct 18, 2023 • 13
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 85
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 24
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86
SLiMe: Segment Like Me

Paper • 2309.03179 • Published Sep 6, 2023 • 31
Gated recurrent neural networks discover attention

Paper • 2309.01775 • Published Sep 4, 2023 • 10
One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 34
Semantic-SAM: Segment and Recognize Anything at Any Granularity

Paper • 2307.04767 • Published Jul 10, 2023 • 23
Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 17
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

Paper • 2307.02321 • Published Jul 5, 2023 • 7
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 46