The Internet has been made the playground of giant financial corporations like Google who divide their time between killing people and making products that don't work, and who are now desperately cannibalising whatever they have that once did work. In their ongoing struggle to keep the wave from breaking, the technologists are in the business of developing Large Language Models (LLMs), computer programmes that steal terabytes of human-written text and remix it to mimic human language, and Text-to-Image Models (TIMs), programmes that similarly steal terabytes of human-made artwork and remix it to match text commands.
Now, let's be clear that LLMs, disingenuously hyped as "artificial intelligence", can only thoughtlessly mimic human language; they do not reason or calculate and are only adept at providing perfectly articulated and grammatically correct garbled nonsense. The TIMs spew out chimeric artwork that is likewise devoid of meaning, appealing only on a lazy, meaningless, aesthetic level.
Despite this, the bots are currently pulling text and artwork from all over the Web - no one gets a say in whether or not their work gets taken. Anything from blogs to fansites to news sites, to everything you can think of: it's all being harvested for the LLMs and the TIMs. The way the bots are doing this is not transparent and the rules they apply are unknown. The big corporations are the main culprits here, but there are also many smaller initiatives full of unscrupulous people who are happy to steal the work of writers and artists against their explicit wishes, in order to accelerate what they'll euphemistically call the "democratization of skills and art".
stopgap solutions
All we can do for now is pick from two solutions: we either remove ourselves and our work from the Web, which would be ceding our home ground to the corporations - or we can use the rules of this place and hope they still respect those. I have opted for the latter, and as such I use five things:
a custom file called robots.txt, a Web convention that allows us to selectively block crawlers. Be sure to test your robots.txt with a validator to make sure you're not over- or underblocking (or just copy my robots.txt)!
a field in the page head: <meta name="robots" content="noai, noimageai"> which stops some bots; this again awaits broad adoption.
None of these precautions are fool proof; they rely on the bots to abide by the rules and conventions of the Internet, which they often don't because they are built by scum with no integrity - but it's our best option for now.
true solutions
This conundrum highlights the sad fact that for the discoverability of our websites, webmasters depend upon malignant corporations like Google, run by baby-brained tech evangelists who fuck up everything they touch. We've opted for decades to write websites that are legible to the robots so they can catalog and rank us, while their search engines have become universally and intentionally worse over time; it has become clear that we need to build our own networks. Happily, those have existed for a long time, but sadly, they've fallen out of favour. They need to be reinvigorated, which is why baccyflap.com is part of several webrings and collectives.
In due time, the corporations will abandon LLMs and TIMs for the next shiny bauble, and perhaps the search engines will become better, though they probably won't; whatever the case, we must not sit around and wait around for things to magically get better. It's time to abandon the corporations. We can't keep counting on them to build good networks for us - we have to do it ourselves. As such, I decided years ago that if people can't find this site through search engines, that's too bad but I won't waste any time thinking about it. It's webrings and linkbacks all the way to a brighter future, baby!
Another thing we can do is never use this technology. While it's mystifying to me that this stuff appeals to anyone, I see plenty of people use it every day and all I can say is don't. Are you really fighting this tide of shit if you occasionally swim laps in it? Avoid it like the plague, curse its name. To prove how strongly I feel about this I've started the no ai webring, which contains only sites that use no AI, no way, no how. See you there!
baccyflap.com's robots.txt currently looks like this; it was updated on the 22nd of September 2025 and currently blocks 310 bots. Please feel free to copy it entirely.