We are team Funnel-Web and this is our project, Pholcidae!
Inspiration
Pholcidae is the name of a family of long-legged cellar spiders. They are passive and have non-adhesive webs, relying instead on the complex, irregular structures to trap prey, much like our project. We were inspired by artists, writers, journalists, and anyone else who posts their work online having their hard work taken and fed to AI training models, by big corporations and government entities, ignoring their express dis-permission to scrape it, with nothing they can do about it. So we decided to take matters into our own hands.
What it does
First, a warning to not scrape the page is provided to web crawlers in robots.txt, but if ignored, the bot will fall into a trap. There is a hidden link in a website that a normal user would never encounter, but web scraping bots, like those who run on Microsoft Playwright will automatically follow. Once followed, the page feeds the scraper randomly generated, garbage information generated with a Markov chain algorithm accompanied by low quality AI generated imagery, with inaccurate meta data tags to taint their scrapings. They are then presented with more random links that lead to even more garbage data in an endless loop. While they bounce around in the labyrinth of pages, information like their public IP, Geo-location, hardware specs and actions within the website are recorded and monitored so the site owners can figure out who is scraping their pages and block their IP ranges.
How we built it
We used Python and Django to create the website and Microsoft Playwright to build our test web scraper.
Challenges we ran into
We were challenged with learning new skills and libraries, in areas we were previously unfamiliar with, as well as logically deducing how real web-scrapers would behave and interact with the website.
Accomplishments that we're proud of
We're proud that we were able to quickly get the base functionality set up and include more features than we intended early on like monitoring, and fudging image meta-data.
What we learned
We learned a lot about the inter-workings of web-scrapers, how to build and structure websites, and collaborative coding practices.
What's next for Funnel-Web
We wanted to have the images be generated in real time with the Markov text serving as the prompt, with randomly selected metadata tags, but it proved to be too slow in practice on the hardware we have. We hope to release the boilerplate code for free on GitHub so anyone can use it to secure their Django website or make their own improvements.
Note, the deployed webpage can't find our static folder no matter what we do, the site works as intended, but we ran out of time to fix the images not appearing, so it doesn't look exactly like the localhost version


Log in or sign up for Devpost to join the conversation.