Inspiration
What sparked this project was a question from this hackathon form: "Is the project new or existing prior to April 16, 2025?" and also my recent exposure to the topic of prior art search, which is crucial when evaluating the patentability of an invention.
It got me thinking: humanity has imagined so much throughout its evolution, but how much of it has actually been realized? How can I come up with an idea and figure out whether others have thought about it, in what directions, and whether it’s already been brought to life?
All of that inspired me to create this project. I not only had a lot of fun building it, but I also learned things and facts I hadn’t even imagined. I hope you will enjoy it too!
What it does
This project evaluates the novelty of a user-submitted idea by retrieving real-world examples of similar concepts from the web using Perplexity research. These examples are embedded, clustered based on semantic similarity (to the described idea), and visualized in an interactive 3D space. Each point includes a concise label, with bubble size indicating how closely it aligns with the original idea. It offers an engaging graphical representation of the idea’s novelty, enabling you to explore your thoughts visually.
- Each data point is a retrieved piece of content (e.g., snippet, article title, idea).
- The bubble size represents semantic similarity to the submitted idea.
- The cluster/topic reflects thematic grouping (via embedding + clustering).
- X/Y/Z axes are derived from dimensionality reduction (UMAP), meaning clusters may seem close on the graphic (in order to be visualized in 3D) but might not be as close in the high-dimensional embedding space!
How I built it
- Input: A natural language description of an idea.
- Retrieval: Uses the Perplexity API (Sonar models) to return up to 15 concise, cited examples (snippets + URLs) of how the idea has been realized or discussed. You can adjust it to return more examples if you wish.
- Summarization: Each snippet is summarized into a concise 12-word label using OpenAI's GPT-4o.
- Embedding: The original idea and all retrieved snippets are embedded using OpenAI's text-embedding-3-large.
- Clustering: High-dimensional embeddings are clustered using adaptive K-means, with the number of clusters determined via silhouette score.
- Similarity score: Calculates cosine similarity between each item in the cluster and the original idea (to determine bubble size). The closer to the idea, the bigger the bubble size.
- Dimensionality Reduction: Uses UMAP to project the high-dimensional embeddings into 3D for visualization.
- Visualization: A 3D scatter plot is rendered using Plotly, with:
- X/Y/Z = UMAP-projected coordinates
- Color = Cluster assignment
- Size = Semantic similarity to the idea (larger bubbles = more similar)
- Hover = Shows summary and similarity
- Heatmap: (optional) A cosine distance heatmap of the embeddings is displayed to show semantic relationships.
- Table Output: A clickable HTML table is rendered under the plot showing:
- Summary (label)
- Full snippet
- Source (URL)
- Cluster topic
- Similarity score
Challenges I ran into
Getting structured output from Perplexity and reliably parsing it into individual, distinct ideas. Enabling the tool to support both the sonar and sonar-deep-research models, which produce significantly different types of responses.
Accomplishments that I am proud of
I had a single thought and turned it into something tangible that I can now share with others, enabling them to explore and engage with the concept of ideas in a fun way. It was fascinating to see how quickly I could bridge the gap between imagination and realization with the help of LLMs.
I also like that I found a way to tightly integrate Perplexity’s output by reusing the citations, making it easy for user to quickly explore the different idea sources.
What I learned
I discovered many interesting facts while testing with different questions and explored various algorithms for clustering and calculating similarity scores. Learned the differences between sonar and sonar-deep-research models.
What's next for Idea Novelty Evaluator
Since hackathons are meant to prove a concept rather than deliver a fully polished product, I believe I’ve achieved my goal. I realized through this project that similar tools have already been implemented, some likely with better precision in their specific domains. My hope for this project is simply to inspire people to explore, play, and discover interesting facts and maybe build on top of it.
Built With
- gpt-4o
- notebook
- pandas
- perplexity
- plotly
- python
- scikit-learn
- seaborn
- umap
Log in or sign up for Devpost to join the conversation.