A ghostless machine
Will AI poetry ever not be bad?
Poetry written by AI is notoriously bad. While LLMs have generally gotten better at writing prose, poetry has continued to elude them. Not that ChatGPT can’t write verse, of course, but that the kind of verse it writes, left to its own devices, is so contrived that it’s noticeably bad. It reads, like all bad poetry, as stilted and formulaic, lacking in imagination and cognitive power. What is passable in its prose becomes intolerable in its poetry.
The assumption is that this is only temporary, as Sam Altman and Tyler Cowen discussed on a recent podcast. Eventually the technology will progress enough that a simple prompt will output something on par with Shakespeare or Dante. Neruda is the poet they mention, interestingly, and what they really mean is Neruda in translation. Cowen says they’ll reach a level 7 or 8 Neruda, but not a level 10 Neruda. Altman says they’ll reach a level 10 Neruda, but people won’t really care. On a scale of 1-10, I find Altman to be a mid 6.
Numbers aside, generative models have already advanced significantly in other fields like music and visual arts. AI poetry by comparison seems positively primitive.
Matt Garland pointed out that this is because there’s no money in poetry, and therefore no incentive in training LLMs to improve the quality of their verse. This answer seems compelling. Perhaps all that’s needed is to tokenize some elements of verse hitherto overlooked in the training data, and train models to attend to these new vectors. But the money to do it simply isn’t there, and so AI continues to be bad at writing poetry. Huck and I agree that it fails pretty spectacularly at lineation. It seems to have no sense of the interplay between line and sentence except for the plainly grammatical. It frequently mixes metaphors and can’t extend them beyond their phrasing. And it lacks a sense of long context dependencies that, in prose, gives it the appearance of sustained thinking. Perhaps these issues are merely a matter of poor parameters. But then again poetry itself is poorly understood.
To bring an example to bear on the problem, here’s a poem I asked ChatGPT write. I kept the prompt very broad so as to leave the bot to its own devices and see what it could create “on its own.”
I told it to “write me a poem about philosophy.”
Of course — here’s an original poem about philosophy. The Lantern of Why Philosophy begins the moment a child asks why, and the world, startled, must pause before answering. It lives in the hush between thought and breath, where wonder lights a match against the dark of certainty. It is the ancient ache to follow questions like constellations — patterns stitched into the night so our searching has direction without destination. Philosophy knocks on doors we built ourselves, and when no one answers, it keeps knocking. It is stubborn hope — that truth is not a single stone but a riverbed of possibles, shaped by every mind that dares to dive. In its quiet rebellion, it teaches us that not knowing is not a wound but a window. And so we look through — again and again — letting the lantern of why pull us into the next horizon of thought.
That’s probably a 3 or a 4 Neruda. Definitely no “Ode to a Lemon.”
I don’t pay for ChatGPT, but I believe it was running this poem on the Plus model, as you get some free credits each day to use the more advanced version. Either way, “The Lantern of Why” is pretty bad as poems go.
A recent study that caused waves (ripples really) in the writing community asked participants to rank human and AI poetry. It concluded that the majority of people, without knowing which was which, preferred the AI to the human poetry. No doubt those same people would prefer this poem to, say, Stevens’ “To An Old Philosopher in Rome.” No doubt they probably think Taylor Swift is a good poet.
But this is a bad poem. The lazy personification. The rudimentary line breaks. The stanzas, each with their stable of probabilistic poetic phrases, “ancient ache,” “stubborn hope,” “quiet rebellion,” (and, contemporary poetry’s favorite word, “wound”) seem disconnected from each other and incoherent. And then there’s the hubris of its conclusion, the “And so” that pretends to resolve what had at no point in the poem needed resolving.
The choice of “possibles” instead of “possibility” in “truth is not a single stone / but a riverbed of possibles” seems like an obvious misstep, until one remembers that, “possibility” being the more likely word, the bot chose the less expected word. This is bad but perhaps interesting, until one realizes that a level of randomness has been built into the prediction model to keep it from always choosing the most probable word. Apparently this is done to keep the responses from sounding stilted. So it chose a worse word, not because it has no taste, but because it was intentionally deoptimizing.
I do like the phrase “dark of certainty.” It’s better than “darkness.” If only it weren’t being illuminated by “wonder lighting a match.” (Does AI know how many fingers wonder has?)
Another intolerable personification, the image of philosophy who “keeps knocking” after it’s already knocked and received no answer, might be profound if it weren’t so inanely comical. One imagines the Traveller in De La Mare’s “The Listeners” who, having kept his word, did not gallop away on plunging hooves, but knocked until the phantoms got annoyed and was simply let in.
Reading the poem feels like the scene at the beginning of Indiana Jones and the Kingdom of the Crystal Skull where Jones realizes he’s in a dummy house that’s about to be nuked. There’s no water in the sink. The fruit is fake. The people are fake. Everything suddenly becomes suspect upon being paid the least consideration.
What the poem really lacks is what Heidegger called Saying, which is the being of language. Sartre called it “the spiritual index finger.” Saying is what language points to or gets at. Without it, we are merely voicing language. “A man may speak, and speak endlessly,” says Heidegger, “and all the time say nothing.” This is, of course, a metaphysical or spiritual point which would be lost on Altman. I suspect that even if he conceded such a point, he would still believe that it were possible to simulate Saying.
It’s an interesting question anyway why the bots can’t write good poetry despite being so good at everything else. Eventually, they claim, AI will get there. It’s an inevitability. If not by ChatGPT 7 or 8, then 9 or 10. It will write level 10 Neruda poems and we won’t care, to Sam Altman’s chagrin.
But for now, “The Lantern of Why” is still a bad poem.
I asked ChatGPT to make it better. Here’s the revised version, which is somehow worse.
The Lantern of Why — Revised Philosophy begins the moment a child asks why — a small question that startles the world into rethinking itself. It lives in the hush where thought meets breath, where curiosity lights a match against the dark insistence that everything is already known. It is the pull to trace questions like constellations — stars arranged by wonder, guiding us forward without guaranteeing arrival. Philosophy knocks on doors our minds have built to feel safe. Even when silence answers, it keeps knocking. It believes that truth is not one final stone to be unearthed, but shifting ground reshaped by every seeker’s step. Not knowing is not a failure here — it’s the opening of a window, the first invitation to see differently. And so we lean outward, again and again — following the lantern of why into the next horizon of thought.
You can play this game with it, asking it to make the poem “better,” and it will compute endless variations, replacing from the vectors of its embedding new words and new phrases which map on to the previous ones. Perfectly triangulating this thing which it is not saying, and not saying in so many different ways.



I wonder if the "advance" of AI "poetry" is dependent on its users being trained to prefer its "writing" to the writings of human writers, especially writers whose work is older and/or more complex. If, say, students use chatgpt regularly - to answer questions, write papers, etc - and working adults use it regularly in their jobs and in private life - dumb little things like composing facebook market place listings - and are not wrestling with language themselves even in little things, could this be the necessary conditioning a general public needs to enjoy machine-made "poetry"? It seems doubtful to me that a culture that required students to memorize and recite poems like Horatius at the Bridge would be taken in by this kind of verbal puffery AI produced for you. It isn't popular now, and in articles I've read about 19th century British education "learning by rote" is usually cast as a negative, but considering where self-expression has got us, one could say all that memorization of poetry and well-turned, meaning-full phrases *could* served one (and could serve us) as a kind of armor against empty, meaningless language and empty, meaningless thinking - or even just sloppy language and sloppy thinking. It could. All that would take work on our part, and it isn't fool proof.
Anyway, really great article.
I think our age has largely lost a taste for *urgency* -- what Sayers would have seen as, yes, akin to the Holy Ghost -- in poetry -- and that happened before ai. Annie Dillard talks about highly polished surfaces with nothing underneath. Wherever prestige, or any other artistically compromised -- perhaps clique-driven -- dynamic hijacks a genuine response, it erodes the irreplaceable and irreducibly individual burden of mystery for some brand of collective meaning/polish. An environment that makes it possible for soulless creativity to pass and flourish. (I don't mind Wonder being personified? And Lewis talks about how good poetry has a fluidity in metaphor where one metaphor is constantly vanishing/morphing into another -- but it feels in the ai poem like "canned elegance".)