I have been using LLMs in my game design workflow for over a year now. Not as a replacement for creative thinking, but as a thinking partner that is always available, never tired, and occasionally brilliant.
The strongest use case is ideation at volume. When I am designing a new game mechanic, I can describe the constraints to an LLM and ask for twenty variations. Most are mediocre. Three or four are interesting. One might be something I would never have considered. That one idea is worth the entire conversation. The LLM is not the creative. It is the brainstorming partner who throws out enough ideas that a few stick.
NPC dialogue is another win. For narrative games, writing hundreds of dialogue lines is tedious and expensive. I write the key story beats and character descriptions, then use the LLM to generate dialogue drafts. The voice is rarely perfect on the first pass, but it gives me a scaffold to edit rather than a blank page to fill. Editing is faster than writing, and the LLM is good at maintaining consistency across a character's lines once you establish the tone.
Level design ideation is where things get interesting. I describe a game's core mechanic - say, a grappling hook with momentum preservation - and ask the LLM to suggest level layouts that would test specific skills. It cannot visualize geometry, but it can reason about challenge progression. Easy rooms that teach the basic swing. Medium rooms that introduce timing. Hard rooms that combine swinging with enemy avoidance. The structural thinking is sound even if the spatial details need human refinement.
Now, where LLMs fail. Game balance. I have tried using them to tune difficulty curves, enemy stats, and economy systems. The output sounds reasonable but falls apart in playtesting. Game balance is an emergent property of dozens of interacting systems. You cannot reason about it from first principles. You have to simulate it or play it. LLMs do neither.
They also struggle with systemic design - the kind of design where player actions in one system cascade into another. If my inventory system affects my crafting system which affects my economy which affects difficulty, the LLM cannot model those interactions reliably. It will give you a plausible-sounding answer that ignores second-order effects. This is dangerous because it sounds right.
My workflow now is clear: LLMs for divergent thinking, humans for convergent thinking. Use the model to explore the possibility space. Use your judgment and playtesting to narrow it down. The model generates. You curate. The model suggests. You validate. If you treat the LLM as an oracle, you will ship mediocre games. If you treat it as a prolific but unreliable collaborator, you will ship better games faster.
One tactical tip: be specific in your prompts. Do not ask 'give me game ideas.' Ask 'give me five mechanics for a 2D puzzle platformer where the core verb is reflection and each mechanic should introduce one new constraint.' Specificity gets you useful output. Vagueness gets you generic lists you could have written yourself.
