paisleypark 's Collections
Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning
Paper
• 2312.06134
• Published • 3
Efficient Monotonic Multihead Attention
Paper
• 2312.04515
• Published • 8
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
• 2309.09117
• Published • 40
Exploring Format Consistency for Instruction Tuning
Paper
• 2307.15504
• Published • 8
Learning Universal Predictors
Paper
• 2401.14953
• Published • 22
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published • 20
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published • 73
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
Modalities
Paper
• 2401.14405
• Published • 13
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
• 2401.14404
• Published • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
• 2401.10891
• Published • 62
Time is Encoded in the Weights of Finetuned Language Models
Paper
• 2312.13401
• Published • 20
Unsupervised Universal Image Segmentation
Paper
• 2312.17243
• Published • 20
Reasons to Reject? Aligning Language Models with Judgments
Paper
• 2312.14591
• Published • 18
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper
• 2312.13314
• Published • 8
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published • 13
In-Context Learning Creates Task Vectors
Paper
• 2310.15916
• Published • 43
Controlled Decoding from Language Models
Paper
• 2310.17022
• Published • 14
CapsFusion: Rethinking Image-Text Data at Scale
Paper
• 2310.20550
• Published • 27
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Paper
• 2311.02262
• Published • 14
Memory Augmented Language Models through Mixture of Word Experts
Paper
• 2311.10768
• Published • 19
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial
Understanding
Paper
• 2310.15308
• Published • 23
An Image is Worth Multiple Words: Learning Object Level Concepts using
Multi-Concept Prompt Learning
Paper
• 2310.12274
• Published • 13
Language Modeling Is Compression
Paper
• 2309.10668
• Published • 85
Finite Scalar Quantization: VQ-VAE Made Simple
Paper
• 2309.15505
• Published • 24
Vision Transformers Need Registers
Paper
• 2309.16588
• Published • 86
Paper
• 2309.03179
• Published • 31
Gated recurrent neural networks discover attention
Paper
• 2309.01775
• Published • 10
One Wide Feedforward is All You Need
Paper
• 2309.01826
• Published • 34
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Paper
• 2307.04767
• Published • 23
Scaling MLPs: A Tale of Inductive Bias
Paper
• 2306.13575
• Published • 17
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Paper
• 2307.02321
• Published • 7
CRAG -- Comprehensive RAG Benchmark
Paper
• 2406.04744
• Published • 46