Inspiration

API security testing is stuck between two broken approaches. Automated scanners quickly find basic vulnerabilities but miss critical logic flaws, while human security researchers can catch these flaws through intuition and understanding but take weeks. Companies need continuous testing but can’t scale human analysis or rely on basic automation.

We built the Turwin to solve this problem. Turwin is an agentic workflow that integrates novel LLM-based and conventional solutions to finding critical vulnerabilities in web and mobile APIs. This enables our hackbot to operate just like a skilled security researcher on applications. Through the electronic healthcare system (EHR) simulations we have made, we show that Turwin can scan for and find vulnerabilities in seconds, eliminating the need for the painstaking manual analysis that security researchers typically go through.

What it does

Turwin is a hackbot that can detect unauthorized access vulnerabilities on an API. In particular, it focuses on access control vulnerabilities that currently no open source cybersecurity automation can detect.

Below is an example that showcases Turwin's capabilities. In our case, we use an EHR simulation, where users can register as either patients or doctors. Patients can update their profile details and view their own test results, while doctors can create and access test results for specific patients. To make Turwin modular and easy to maintain, we have made our EHR simulation plain and simple, implementing only fundamental features of modern EHRs.

Let’s say Alice, a patient, logs in and requests her test results through an API call like: GET /api/test_results?patient_id=123 But what happens if Alice manually changes the patient ID to Bob’s (456) in the request? GET /api/test_results?patient_id=456

If the API doesn’t properly enforce access control policies in their codebase, this endpoint could return Bob’s private medical records to Alice upon request, which would have caused a serious security breach.

Turwin detects this vulnerability long before it can be exploited by malicious parties. Preferably, Turwin would be run in development and staging environments by companies, long before products meet the end users.

How we built it

Turwin uses two important files to make an initial LLM call:

      
  1. The website’s OpenAPI specification
  2.  
  3. File composed of request-response pairs from a scan of the website.

The OpenAI spec is crucial because it defines the request syntax and intended behavior of different calls to the endpoint, allowing us to restrict our search of vulnerabilities to valid requests that fit the format that the API expects. For the request-response pair file, we use a custom crawler built through Playwright which takes advantage of the fact that the ui is the intended behavior for the apis.

Turwin uses an LLM-powered analyzer which can find the “objects” in a website. We define “objects” as elements which allow for the interaction and manipulation of webpages. Our crawler feeds the “objects” and two files above into an LLM, which creates what we call an initial permission model. This permission model is updated as new requests reveal updates to objects in the web application and show objects that have not been seen before.

To test Turwin, we developed a custom vulnerable application, Turwin Medical Center, which simulates a patient-doctor portal. This application is containerized using Docker to allow for easy deployment and controlled testing environments. To provide a user-friendly interface, we built the Turwin application using Windsurf, which enables users to input an OpenAPI specification file and visualize the API structure in a tree format. This interactive visualization allows users to hover over different nodes to scan for vulnerabilities.

Challenges we ran into

One of the biggest challenges was defining what an “object” is to an LLM. While humans can easily determine interactive elements from request-response pairs (e.g., user, patient, doctor), conveying this understanding to an LLM is complex. Parsing the OpenAPI spec and network logs in a way that allows the LLM to extract meaningful access control rules required extensive experimentation. Building the crawler also presented significant difficulties. We needed a mechanism that could not only navigate complex frontends but also capture interactions that reveal the true behavior of the underlying APIs. Handling dynamic content and authentication mechanisms while ensuring accurate request-response pair extraction was a major hurdle. Additionally, designing an effective permission model that adapts as new interactions are discovered was non-trivial. Ensuring that Turwin could update its model dynamically without requiring excessive retraining was crucial to making it scalable.

Accomplishments that we're proud of

We successfully created a modular framework that can be expanded to automatically detect security vulnerabilities in websites. Unlike existing solutions that rely on either static analysis or human-driven penetration testing, Turwin bridges the gap by combining automation with intelligent reasoning, reducing the time needed to identify access control flaws significantly. Another major accomplishment was building a realistic and controlled testing environment using our custom EHR simulation. This allowed us to validate Turwin’s capabilities against real-world use cases in a way that is both reproducible and extensible. Finally, we’re proud of the usability of Turwin. By integrating an intuitive visualization interface with automated security analysis, we’ve made it easier for security teams to test APIs without requiring extensive manual effort.

What we learned

We gained a deeper understanding of the complexities of API security testing, particularly around access control vulnerabilities, which remain one of the hardest categories to automate. The process also reinforced the importance of balancing LLM-based automation with traditional cybersecurity techniques to achieve meaningful results. We also learned a lot about the limitations and strengths of using LLMs for security analysis. While they can infer logical structures and patterns efficiently, ensuring accuracy requires high-quality input data and clear contextual guidance. Additionally, our experience with building a crawler that accurately extracts real-world interactions showed us the importance of grounding automation in realistic use cases rather than synthetic or theoretical data.

What's next for Turwin

Moving forward, we want to refine the object identification process further. Instead of treating objects statically, we aim to implement a more dynamic approach where objects and their associated permissions are continuously refined based on new interactions observed in live applications. We also plan to expand Turwin’s capabilities beyond detecting unauthorized access vulnerabilities. Future iterations will include detection of business logic flaws, race conditions, and API misconfigurations that are often overlooked by conventional scanners. Additionally, we want to improve integration with existing security tools and CI/CD pipelines, allowing organizations to seamlessly incorporate Turwin into their development workflows. This will help ensure that security testing happens continuously and at scale. Ultimately, our goal is to make Turwin the go-to security assistant for API testing, combining the speed of automation with the depth of human-like analysis.

Built With

Share this project:

Updates