modu | Devpost

Beige couch
Beige couch orientation
Green couch
Wall paintings
Inpainting annotation
Uploading
Couch removal

Inspiration

Currently when you shop for furniture, you either have to go to the showroom and buy it (like visiting IKEA) or you buy it online and hope it looks as good as the pictures. Therefore, it is often hard to visualize how well a large piece could fit into your living space without taking down and comparing actual measurements, or just seeing it in-person. We wanted to solve this problem using a multi-faceted approach with the goal of making furniture shopping the most convenient and pleasant experience as can be.

If successful, this product will in turn save merchants lots of money from EXPENSIVE furniture returns that requires large trucks, many people to move it, and much more. It will also help customers save lots of headache when buying furniture, no more long trips to IKEA to measure furniture, or the headache of ordering and returning a product.

What it does

Take a picture of your living space and upload it to Modu. Then, simply paste in the URL of the prospective piece of furniture that you are considering. This URL can be an online store URL, or anything that resembles one. Then, in Modu, use the drag-n-drop and inpainting features to annotate where you would like the furniture to be. You can visualize multiple pieces of furniture at a time.

How we built it

Frontend & Backend

Frontend was built using React and TailwindCSS, while the backend was built using Python Flask to serve as a API.

Web Scraping

Web scraping was done via python requests & beautifysoup to capture the webpage, then fed into Cerebras's gpt-oss-120b 3000 t/s model to format the unstructured html into beautiful JSON input for our website and database. All fetches are also cached to save time. Because the Cerebras's accelerated model is so fast, it adds almost 0 noticeable latency to the fetching, yet solving the problem of scraping multiple sites so cleanly.

Image Editing

We used the Gemini 2.5 Flash Image Edit model, which went viral recently. However, it was actually much more difficult to use as the model is quite lazy (sometimes refuses to makes edits) and trying to position a 2D object for a 3D render is quite hard.

Challenges we ran into

We spent a considerable amount of time developing a prospective feature that integrates a Monocular Depth Estimation model to provide additional depth context to allow users to annotate the base image in 3 dimensions. In particular, Depth Anything V2 and MoGe were tested. When none of these models performed to expectation, research was also put into generating accurate 3D environments using Simultaneous Localization and Mapping (SLAM) and Motion Parallex Depth Perception. The challenge was the lack of information on the camera's extrinsics and intrinsics, such as the focal length (FOV), which made the generated depth map disproportionately slanted in the y-direction due to perspective.

Accomplishments that we're proud of

We're proud that we were able to get the frontend up and running within 2 hours and fully connect it to the backend. We are also quite proud of the aesthetic look that we ended up with. Personally, I was most proud of the web scraper that allowed us to add products from any website via a simple fix (cerebras accelerated inference llm) that added near 0 latency layer that is one of the main backbones of our product.

What we learned

Lot's and lot's of prompt engineering. Communicating instructions from multiple images merged into one to a generative model became a bigger challenge than we expected.

What's next for modu

Even more accurate image synthesis.
Faster renders (Currently ~10-15s per edit).
As more renders are created, the resolution decreases (say put the same image into it and say give me the same image 100 times, it will have decrease quality). Find a way to prevent this known issue.