Inspiration

Data, in its rawest form, is the foundation of everything—but only if one knows where and how to look. As a team of budding developers, we often found ourselves discouraged by messy CSVs, inconsistent documentation, and cryptic Kaggle APIs. Though repositories were abundant, accessibility was not. We realized that for beginners hoping to build real-world projects, the barrier wasn't imagination, but rather, it was data literacy.

So we asked ourselves: what if we created a safe space for those starting out, a kind of sandbox where aspiring analysts and developers could explore datasets, practice cleaning and visualization, and even be guided toward live sources, all within a single streamlined platform?

Enter: MiniDataDev.

What it does

MiniDataDev is a practice environment and dataset companion for beginner data scientists and developers. It offers:

  1. A Google Colab-compatible notebook for immediate, interactive analysis of 4 preloaded datasets (ranging from Kaggle to World Bank).
  2. A Python-based backend (loader.py, core.py) that handles dataset loading, cleaning, and summary generation.
  3. An AI chatbot that assists users in finding live datasets using natural language prompts (e.g., "Find me population data from Southeast Asia").
  4. A Kaggle integration via kaggle.json for seamless retrieval of datasets with a single command.
  5. Clean code structure with modules ready for extension by other teams or educational instructors.

How we built it

We built MiniDataDev using a modular Python structure with JSON and API integration. The core logic lives in the chatbot folder, where scripts handle dataset recommendations, mock project ideas, analysis explanations, and Colab notebook generation. Using API Keys, the chatbot connects to OpenAI and Kaggle, allowing users to explore live datasets. For offline functionality, we included four preloaded CSV files in the data folder, enabling beginner-friendly practice even without internet access. The data_loader script reads these files, and demo analyses are shown in the notebooks folder. We collaborated to brainstorm features and GUI design, then tied everything together in app.py. Additionally, supporting files like settings.json, datasets.json, and requirements.txt manage configurations and dependencies.

Challenges we ran into

While building MiniDataDev, we faced some challenges using GitHub — especially around managing forks and collaborating across branches.

A key issue was merging updates: at one point, a critical script got stuck in an isolated branch, and resolving conflicts manually was time-consuming. Some team members also couldn’t access forked branches easily, which slowed down testing and reviews.

We also had trouble syncing file paths and dependencies across machines, especially early on before setting up proper configuration files.

Despite the bumps, it was a great learning curve. We became more confident with pull requests, conflict resolution, and coordinating code across a remote team — lessons that directly influenced how we designed MiniDataDev to support beginner coders

Accomplishments that we're proud of

We’re proud to have delivered a fully functional Colab tool within the span of a hackathon that:

  1. Onboards beginners to real data cleaning and exploration tasks
  2. Integrates natural language queries into dataset discovery
  3. Offers a replicable, open-source structure for others to build on

More than the code, we made a project that beginner devs like ourselves can actively use!

What we learned

The hackathon helped us learn more about collaboration and the coding process as a whole. We gained hands-on experience using Git, branching, and working with API keys to fetch data instead of manually downloading everything. We also learned how to build a working prototype under pressure, from loading datasets and structuring folders to debugging, testing, and refining features. Most of all, we saw how ideas evolve as a team, that the greatest ideas are formed from rough sketches that we painstakingly code into a working tool.

What's next for MiniDataDev

There are lots of ways we want to improve MiniDataDev. We'd like to expand the selection of built-in datasets and make the interface more polished!

One exciting idea is to gamify the app, where users earn badges and unlock new features as they go! We’d also love to create a built-in notebook view, where users can test code directly inside the app and get help from the chatbot in real time.

Whether it's helping students or beginner developers, we hope to keep growing this tool and making it more fun and interactive for fellow devs of any and all skill level!

Built With

Share this project:

Updates