Inspiration

As students trying to gain insight from multiple datasets, we found that the issue was having to go through hours of preprocessing. We wanted to remove that barrier of entry by having a tool that can refine datasets and provide suggestions as to how to augment them. Half, if not more, of the data processing challenge is identifying where you can merge, cut, and clean your data - with an elegant interface, Monarch hopes to bring easy data science to everyone.

What it does

Monarch is a web-based application tool that allows the user to observe the problems within their dataset and take action towards merging datasets or seeing which datasets columns are unnecessary. Monarch is unique in the fact that it is intuitive at the individual level, allowing full control over the data engineering process with just clicks. It is intelligent data processing at your fingertips.

How we built it

Our tech stack consists of Next.js (TypeScript) for the frontend, Express/Node.js for the backend, Amazon S3 + Numpy/Pandas for model refinement. For styling, we utilized the TailwindCSS framework and created the application assets with Adobe Illustrator and Adobe Photoshop.

The data processing portion and the matching algorithm for columns consisted of a 5 step pipeline to allow for adaptability. The algorithm utilized Spearman and Pearson coefficients on numerical data, as well as Euclidean analysis of distribution metrics. For non-numeric data, the algorithm utilized fuzzy syntactical matching and Latent Semantic Analysis to identify where columns could be merged.

Challenges we ran into

Some of the challenges we faced were connecting various parts of our project together to build a cohesive application. To address this, we simplified our issues to avoid handling complex components simultaneously.

Accomplishments that we're proud of

We utilized new libraries that we never tackled before, such as React Flow for the schema visualization and FastAPI for the Python backend.

What we learned

We learned that pre-planning could have gotten better, and we noticed as the project went on that we were solving quite a complex problem, especially when we attempted to tackle data augmentation and merging datasets. However, we discovered that by collaborating as a team, we were able to accomplish a myriad of tasks that were unable to be done individually.

What's next for Monarch

We completed problematic column recognition, however for next steps we would allow more user input and control with the data – such as free response options as opposed to simple choice answers, with more agency over how the data is manipulated and an iterative process – allowing for a more in depth back and forth "conversation" between Monarch and the user!

Share this project:

Updates