Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper โข 2510.18279 โข Published Oct 21, 2025 โข 6