**Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining, 2503**
**Can Large Language Models Develop Gambling Addiction?, 2510**
Diffusion Transformers with Representation Autoencoders
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
MIRA: Multimodal Iterative Reasoning Agent for Image Editing TL; DR: image edit을 통해 visualized COT를 수행합니다.
Delta Activations: A Representation for Finetuned Large Language Models