How do Large Language Models Navigate Honesty and Helpfulness?

Do We Need Zero Training Loss After Achieving Zero Training Error?

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks


Language Models are Injective and Hence Invertible
