![[2211.00241] Adversarial Policies Beat Superhuman Go AIs](https://videoddd.com/wp-content/uploads/2024/12/221100241-Adversarial-Policies-Beat-Superhuman-Go-AIs.png)
[2211.00241] Adversarial Policies Beat Superhuman Go AIs
See PDF of the paper titled “Adversarial Policies Beat Super human Go AIs” by Tony T. Wang and 10 other authors
Abstract:We attack the state-of-the-art Go artificial intelligence system KataGo by training an adversarial strategy against it, and achieve a >97% win rate on KataGo running in a superhuman environment. Our opponents won’t win just because they play Go well. Instead, they tricked KataGo into making a serious mistake. Our attack transfers zero shots to other superhuman Go AIs and is intelligible to the extent that a human expert can achieve it without the help of algorithms to consistently defeat superhuman AIs. Even in KataGo agents that are adversarially trained to defend against our attack, the core vulnerabilities discovered by our attack still exist. Our results show that even superhuman artificial intelligence systems can have surprising failure modes. Provide sample games This https URL.
Commit history
Sender: Wang Dong [view email]
[v1]
Tuesday, November 1, 2022 03:13:20 UTC (838 KB)
[v2]
Monday, 9 January 2023 19:53:05 UTC (6,054 KB)
[v3]
Saturday, February 18, 2023 22:05:01 UTC (6,849 KB)
[v4]
Thursday, July 13, 2023 06:37:29 UTC (4,698 KB)
2024-12-23 13:10:15