We will be undergoing planned maintenance on January 16th, 2026 at 1:00pm UTC. Please make sure to save your work.

Inspiration

Flaky tests are one of the most frustrating problems in modern software development. They waste time, break pipelines, and cause teams to lose trust in their testing process. We’ve all seen that one test that randomly fails for no apparent reason — and that pain is what inspired us to create FlakyTestX. We wanted to build a tool that not only detects flaky tests but helps teams understand and fix them — intelligently, efficiently, and automatically.

What it does

FlakyTestX is an AI-powered test quality tool that detects, analyzes, and recommends fixes for flaky tests in your codebase. It works in three core stages:

  1. Detection: Runs your test suite multiple times, calculates a flakiness score for each test, and logs patterns of instability.
  2. AI-Powered Analysis: Sends flaky test logs to an AI model that determines likely causes such as race conditions, shared state, or timing issues.
  3. Actionable Recommendations: Returns specific insights and even code suggestions to help developers resolve test instability quickly.

It also features a Streamlit dashboard that displays flakiness metrics, charts, test logs, and AI insights.

How we built it

FlakyTestX is built using Python and structured around three major modules:

  • flaky_detector.py: Executes pytest-based test suites multiple times and logs their stability behavior.
  • ai_insight_generator.py: Uses OpenAI (or mock responses) to analyze flaky test failures and provide debugging advice.
  • dashboard.py: Built with Streamlit to visualize results in an interactive and presentable format.

We also included:

  • CLI support
  • .env configuration
  • JSON-based result files for CI/CD integration

Challenges we ran into

  • Simulating real-world flaky test conditions in a controlled environment
  • Ensuring AI responses were context-aware and actionable
  • Optimizing test run performance
  • Designing a clear and interactive dashboard
  • Differentiating between true flaky behavior and consistent failures

Accomplishments that we're proud of

  • Fully automated flaky test detection and analysis pipeline
  • AI-generated insights and fix suggestions
  • Streamlit dashboard for interactive visualization
  • Offline-ready fallback using mock AI responses
  • CI/CD-friendly CLI tools

What we learned

  • Flaky test behavior often hides in subtle patterns
  • AI can be a powerful ally in debugging and QA workflows
  • Combining traditional statistics with LLM insights can reduce dev time
  • A stable test suite is just as important as passing one

What's next for FLAKYTESTX

  • Add support for JavaScript frameworks like Jest and Mocha
  • Integrate with GitHub to comment on PRs with flaky test info
  • Visualize flaky test trends over time
  • Add notifications for flakiness spikes in nightly CI runs
  • Dockerized version for enterprise use

Built With

Share this project:

Updates