Funnel-Web

What the web-scraper sees inside the trap
The inconspicious home page hiding the trap
The page that appears when a banned IP range tries to access the site
The admin page to used monitor bot traffic and ban unwanted data-scrapers

We are team Funnel-Web and this is our project, Pholcidae!

Inspiration

Pholcidae is the name of a family of long-legged cellar spiders. They are passive and have non-adhesive webs, relying instead on the complex, irregular structures to trap prey, much like our project. We were inspired by artists, writers, journalists, and anyone else who posts their work online having their hard work taken and fed to AI training models, by big corporations and government entities, ignoring their express dis-permission to scrape it, with nothing they can do about it. So we decided to take matters into our own hands.

What it does

First, a warning to not scrape the page is provided to web crawlers in robots.txt, but if ignored, the bot will fall into a trap. There is a hidden link in a website that a normal user would never encounter, but web scraping bots, like those who run on Microsoft Playwright will automatically follow. Once followed, the page feeds the scraper randomly generated, garbage information generated with a Markov chain algorithm accompanied by low quality AI generated imagery, with inaccurate meta data tags to taint their scrapings. They are then presented with more random links that lead to even more garbage data in an endless loop. While they bounce around in the labyrinth of pages, information like their public IP, Geo-location, hardware specs and actions within the website are recorded and monitored so the site owners can figure out who is scraping their pages and block their IP ranges.

How we built it

We used Python and Django to create the website and Microsoft Playwright to build our test web scraper.

Challenges we ran into

We were challenged with learning new skills and libraries, in areas we were previously unfamiliar with, as well as logically deducing how real web-scrapers would behave and interact with the website.

Accomplishments that we're proud of

We're proud that we were able to quickly get the base functionality set up and include more features than we intended early on like monitoring, and fudging image meta-data.

What we learned

We learned a lot about the inter-workings of web-scrapers, how to build and structure websites, and collaborative coding practices.

What's next for Funnel-Web

We wanted to have the images be generated in real time with the Markov text serving as the prompt, with randomly selected metadata tags, but it proved to be too slow in practice on the hardware we have. We hope to release the boilerplate code for free on GitHub so anyone can use it to secure their Django website or make their own improvements.

Note, the deployed webpage can't find our static folder no matter what we do, the site works as intended, but we ran out of time to fix the images not appearing, so it doesn't look exactly like the localhost version

Built With

apache
beautiful-soup
django
gcp
git
html
markovify
playwright
python
sqlite

Updates

Tristan Moses started this project — Oct 05, 2025 12:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.