It was 2am. My laptop chimed once and then kept chiming. A new user had just joined our nascent Discord server and sent a flurry of messages.
Ayo, just played for like 2 hours
This was amazing
Like, really amazing
I did a bard run
And just waffled my way out of everything
The way I could sing on the tavern
I made a deal with a shopkeeper to sell me his magic carpet by doing some marketing for him
I made a bandit leader cry and stop raiding a village
Had we built something people wanted?
We had just released rpgGPT, a LLM powered text adventure game with a twist: you had a health bar and if it went to zero you would die.
In the following weeks, thousands of people would end up playing rpgGPT. We had big red “Slap” and “Punch” buttons that players used to whatever effect they wanted. Story getting a little dull? Slap the party leader in the face and turn up the temperature.
Behind the scenes we had a system that would tie the LLM generated text with hooks to a light weight game engine. This allowed for emergent storytelling:
After a particularly harrowing adventure, a character named Eldric the Bard suggested he deserved the moniker “the Brave” and the LLM was able to oblige by using a name change hook.
In another instance, a player, after transforming into a dragon, was able to use a grab item hook, originally meant for things like stealing a guard’s keys, to pick up characters and fly with them to new locations.
Players quickly realized it was more entertaining to break story rules than to follow them. Some players would summon health potions out of the air, and others would stack barrels of oil and then light them on fire to create Rube Goldberg machines of death.
Despite the successes of rpgGPT we had some open questions of what its next act should be. While power users delighted in the open end nature of rpgGPT many players coming from traditional video games had two complaints:
It’s too easy. If I can manipulate the whole universe, where does the challenge come from?
It’s too hard. The world is too open ended, it’s mentally exhausting figuring out what I should do next.
Building Off Traditional Games
A potential solution to both problems is incorporating traditional game mechanics. A defining feature of games is their ability to immerse players in a “flow state”1, an equilibrium between being overwhelmed and being bored. In game design this is often just called “flow”.
While holding an audience's attention matters in other creative mediums (a mystery book would be frustrating if it were 90% red herrings, a TV show would be exhausting if every 2 minutes was a cliffhanger), flow defines games because games are interactive.
Fun in traditional games is tied very closely to the idea of challenge. Pac-Man wouldn’t have been considered fun if the ghosts ate you the moment you popped in your quarter. But it would also not have been fun if Pac-Man were so easy your bad friend would still get your high score.
The fidelity and scope of games has blown up over time but the throughline of
Pac-Man (1980), Doom (1993), Skyrim (2011) and Elden Ring (2022) is flow.
We looked to leverage traditional game mechanics in our follow up to rpgGPT, Talking to Monsters, a Pokemon-like game where you could choose to talk to monsters or fight them.
Talking to Monsters had a Pokemon-style overworld that players could walk around in, but the moment a monster popped out of the wild grass the game would switch to a turn-based LLM powered experience. Monsters were well monsters, so we had to figure out a way to make them easy to talk to. We wanted to avoid situations where players didn’t know what to say next. We landed on a weird fantasy Jerry Springer mash up: monsters were going through nasty divorces, were stuck in love triangles, or had baby momma drama.
The fighting loop that Talking to Monsters drew from has lots of prior art that successfully implemented flow. But the core loop of monster collectors2 (fight monsters, level up monsters, fight stronger monsters) proved to be in tension with the amount of freedom we wanted to provide players based on our rpgGPT experience.
Some players wanted to play Talking to Monsters like a normal monster-collector game; other players were frustrated whenever they ran up against a limitation imposed by the traditional game mechanics. In one instance a player speedran to an elite boss and used the LLM interaction mode to summon an army of 300 soldiers to take on the boss for them while they stood in the shadows. In rpgGPT this would have been considered a great success, the story of the successful jailbreak being a reward on its own. Yet in Talking to Monsters it was just another instance of giving players an opportunity to optimize the fun out of a game.3
If a player approached Talking to Monsters exactly like Pokemon, they could in theory avoid ever having to write at all. If a player approached Talking to Monsters solely as a jailbreak exercise4 they could sidestep having to learn the combat system. While these mediums were both present in Talking to Monsters they seem contradictory rather than mutually reinforcing.
There seems to be a conflict at a primitive level between asking a user to use traditional game inputs (mouse, joystick, timed button presses) and asking a user to creatively engage with an open text input bar.5 If flow is an equilibrium state, an open text input bar destabilizes the equilibrium.
Leaning Into Less Game-y Games
We knew we wanted to give players more freedom; two genres seemed like plausible candidates: sandboxes and party games.
A good sandbox gives a player interesting tools and gets out of the way of the player. There is a range of what it means to be a sandbox. Garry’s Mod has no predefined goals for the player; Hitman has goals (kill your target), but the fun comes from the creative ways the player can satisfy that objective without predefined instructions.
An attractive property of LLMs when it comes to game design is the infinite combinability of content.6 Hitman levels are known for their replayability. Could we build a Hitman-adjacent game loop that leveraged LLMs strengths?
Inspired by Hitman, we developed Snatched, a game where you play as an alien and have to lure humans back to your spaceship.
Early unreleased versions of Hitman had intelligent NPCs that would respond to the player character in dynamic ways. Hitman’s development team, during playtesting, found that dumber NPCs, performing actions on a predictable loop, better set up the player for interesting assassinations. One of these predictable behaviors was reacting to the player character based on disguises (think Clark Kent-like disguises: I am now wearing a suit and glasses, so treat me like a journalist instead of Superman; I am now wearing a white lab coat, so I am clearly a doctor, not a professional bald assassin with a barcode on my head).
Disguises in Hitman telegraph potential assassination ideas to the player. For example, if you see a chef walking in and out of a Michelin-star restaurant and the target is an oligarch, maybe you can lure them to the restaurant, shove the chef in a storage closet, and poison their food.
In LLM games, telegraphing possibilities to the player is even more important because of the free text input. In Snatched, we gave the alien body-snatching powers. We found that, if unprompted, many users would walk up to a target and say, “I need your help in the back alley,” or, “There’s been a car accident! I need your help!”. However, when we readily supplied players with increasingly ridiculous humans to body-snatch, they were primed to be more creative and had more fun doing it. If a player is puppeteering Santa Claus, they may threaten a character with no presents for Christmas or (real user example) claim they need help calming Rudolph, the red-nosed reindeer down from a cocaine binge.
In a traditional game with an LLM bolted on, a development goal becomes to reduce hallucination. In an LLM-native sandbox game, hallucination is the feature. The funnier or more absurd it is, the better the user experience. In Snatched, we doubled down on this by adding even more ridiculous, out-of-place characters. Abducting humans as an alien who has body-snatched Santa is fun, but abducting humans as an alien with the body of Joseph Stalin, who begins all conversations with “In Soviet Russia…,” is even better.
We had plans to add more Hitman-like features to Snatched, things such as special abilities linked to certain characters and unlockable locations. However, players’ love of Snatched seemed to come from a different source of power than even other sandbox games. Snatched was less of a sandbox game than an improv stage.
We noticed a growth pattern with Snatched: if someone signed up to play, typically at least another user from the same region would sign up right behind them. Our working theory was that people were playing Snatched with friends, and our own playtesting suggested this was the best way to play.
Leaning into the idea of AI-enabled improv, we developed Snatched Party, a Jackbox-style party game. Party games are not really games; the core loop is being funny with friends rather than something that can turn into flow. The party games Snatched Party draws influence from have their own free text input boxes, so how do they solve the lack-of-difficulty problem and the overwhelmingness of open-ended input problem?
Successful party games sidestep the difficulty problem entirely. Player expectations are a powerful tool; if there is no expectation of difficulty or skill, it is literally just not a constraint anymore. An added bonus: sidestepping difficulty provides additional tools for solving the other problems.
In Snatched Party, an alien game host presents players with a prompt that they have to answer with as funny an answer as possible. If a player does not feel like they have a good answer, an in-game phone-a-friend feature gives them access to three AI comedians that prime them. Is this cheating? Maybe, but this kind of cheating is not optimizing the fun out of a game; it is injecting fun into it.
The less game-y a game is, the better equipped it is to take advantage of the thing LLMs are uniquely equipped for: dynamic content creation.
Embracing Not a Game
Many game developers have argued some form of “people are lazy and asking them to write text is doomed to fail,” but young people write thousands of messages across texting and messaging apps per month.7
The truth is much more context-dependent. In traditional games, where flow is their source of power, player input of free text is a bad fit8. If flow is not the goal, there is nothing intrinsically problematic about free text; players just have to be primed. Texting and messaging apps have a sort of social priming that keeps users engaged.
Priming is the answer to the rpgGPT complaint that “It’s too hard. The world is too open-ended; it’s mentally exhausting figuring out what I should do next.”
Excitingly, the latest generation of LLMs, released only in the last couple of months, is really good at priming through reasoning and agentic capabilities. Multiple agents can manage a story proactively, looking for opportunities to make it more interesting and taking the mental load off the player.
User expectations matter. “It’s too easy. If I can manipulate the whole universe, where does the challenge come from?”, is a complaint that really only makes sense in the context of a game.
Other mediums don’t have this expectation. What would it mean for a book9, show, or movie to be deemed too easy? When expanding our view from traditional games to entertainment in general, the real issue to tackle isn't difficulty. It's boredom. Luckily, the tools we use to prime the player can also be leveraged here.
The lens of flow has described what made 35+ years of games successful. For new AI experiences to mature into their own medium, the first step is embracing their not-a-game nature.
Monster collectors weren’t the only core loop we experimented with. We also developed Adventure Together, a Paper Mario like side scroller with simulated co-op that focused on performing combo moves with an AI friend. One idea I was particularly attached to when developing Adventure Together was that if players piss off their AI friend, the AI friend should be able to choose to leave the party.
Raph Koster’s Theory of Fun for Game Design (2004) is the earliest recorded version of this idea I could find. Highly recommend the book.
It’s possible that jailbreaking LLMs itself could be a game loop. See Patrick Blumenthal’s Yudbot experiment. The challenge here is players want to feel like they’re Sherlock Holmes they don’t want to have to literally be Sherlock Holmes (or perhaps Pliny the Liberator in this case) in order to play the game.
Neal Agarwal’s Infinite Craft is an AI game that manages to sidestep the free input problem by making the input mechanism dragging-and-dropping craftable materials together. Constraining the input mechanism seems to be the key here, note how Infinite Craft successfully adopts the core loop of non-AI games, Little Alchemy and Doodle God.
You have to be careful to not take this idea too seriously. There’s not a content scarcity problem. Steam data suggest the average game has a 16% completion rate. Xbox data confirms a <20% completion rate. It’s probably better to think about LLMs as unique creation tools rather than infinite creation tools.
If the core loop of Jackbox style party games is be-funny-with-your-friends, then the group chat is probably the precursor to this entire genre.
Middle-earth: Shadow of Mordor isn’t an LLM game but its nemesis system uses procedural scripting to generate emergent story telling based on in-game player events like dying or defeating enemies. AI games that find alternative methods of input to the LLM system instead of free text input are promising and for projects that don’t want to wholesale give up on being a game this is probably the most ripe area for experimentation.
The Choose Your Own Adventure series is the sixth best selling book franchise of all time. Being able to cheat was probably part of the fun.