Takes on “Alignment Faking in Large Language Models”
Alignment faking in large language models \ Anthropic
PSA: Avoid faking your Spotify Wrapped results if you’re in Congress