• **Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining, 2503**

  • **Can Large Language Models Develop Gambling Addiction?, 2510**

  • Diffusion Transformers with Representation Autoencoders

  • Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

  • MIRA: Multimodal Iterative Reasoning Agent for Image Editing TL; DR: image edit을 통해 visualized COT를 수행합니다.

  • Delta Activations: A Representation for Finetuned Large Language Models