Show HN: AI at Risk, a silly LLM benchmark
ai-at-play.onlineHey HN! Thought I'd share this side project I've been working on in the last couple of weeks: 4 AI agents play the classic board game Risk, with make belief personas (Genghis Khan doing great, Captain Jack Sparrow not so much) and randomly selected models.
I added the new "cloaked" Horizon Alpha model last week, and it has been absolutely decimating the competition (I've also just added Horizon Beta, so we'll see how it does).
It's a lot more fun than a robust experiment, but I've found the interactions really interesting. If you'd like more detail, you can also read my blog post here: https://andreasthinks.me/posts/ai-at-play/