GIF
Chrome Extension in action
Starting UI

Berry Tongue

Cerebral Beach, October 12th - 13th, 2024

Marissa Li, Zandy Zhao, Jonathan Ouyang, Andre Xiang, Clara Yee

Inspiration

Design Brief

ㅡ Target Consumer	People interested in learning new languages through consuming visual media (Anime, Comics, Manga, Movies, TV shows)
ㅡ Problem Statement	Learning a language is often monotonous, time consuming, and ineffective, leading many to give up in endeavors to expand their language diversity.
ㅡ Design Statement	A browser extension will be created that can pause a subtitled video, and translate & display its translated captions in a suitable manner. Furthermore, the extension would provide a word-by-word translation of the original text, so the user could learn how each word or character correlates to the translated text of the language they know.
ㅡ Criteria	Video playing on the browser must contain subtitles of an existing language. Furthermore, the functionality could not be called many times in short succession, as the API used for data generation is not infinite.
ㅡ Constraints	Product must be made within the 24 hours of the Cerebral AI LA Hackathon Project must be submitted via DevPost

Research Summary

Languages serve as the basis for communication throughout society. As society grows more globalized, the necessity to bridge between cultures through languages is increasingly prevalent. While currently there are many language learning products on the market, there exists a common general consensus that a great portion of these products become ineffective, whether through the overuse of “game-like” aspects or for not offering enough incentive for their consumers to continue through monotonously designed tasks.

What it does

Design Specifications

Functionality for the extension first begins by putting a screenshot of the current window, which displays a Manhwa/Webtoon, through Google Gemini API, which searches the window for decipherable dialogue. These dialogues are then translated through the same API, which is provided with a prompt to not only translate the text but also generate a word-by-word translation for later use. The translated text is overlaid on the window in its intended place (below original dialogue text), with the extension allowing the user to view what each individual word/character of the original language corresponded to in the translated language. This overlay extension would be done through our created extension, named BerryTongue.

Flowchart of program functionality

How we built it

Implementation

Frontend Design

Figure: illustration Reference Sheet

Initially, illustrations of the mascot were drafted, and a final mascot was chosen, as seen in the image above. The mascot “Bruni Berry”, is a blueberry dog with a two-tone color palette, consisting of yellow and blue to maintain a color-blind friendly UI. These colors were chosen specifically as contrasting colors to enhance this.

Designs were developed using Figma to help visualize the aspects required to implement. In the interface, users have the option to first select which media they prefer to learn from and then to start or pause the learning sequences.

Once visualized, the Figma components were transferred into HTML and CSS components, allowing them to run on the Chrome side panel extension. The CSS document was created by copying the class attributes of the Figma file, which were then created as the elements in a separate HTML file. As a result, despite there being multiple pages representing each state of the program, all stylization is in one file as all of the formatting is evident in style.css.

Figure: Completed Figma panels of each frame of the chrome sidebar

Figure: Completed UI of extension displayed as a side panel

Python Backend

The backend of the code primarily utilized the prompting of Google Gemini API to be able to distinguish the words that needed to be translated in a given image. These words were then recorded and put through translation, yet again by the same API. This time, however, Gemini is specifically prompted to not only translate the entire input, but to also analyze the nuances between the translations of each individual phrase and break them down into an understandable format. Both of these sets of data are then passed to the frontend.

The majority of the code logic was written in Python syntax, made possible through the creation of a Python web server using flask. This meant that when the backend received the data collected by the frontend, the logic that the stored values underwent were coded in Python, despite Python not being usable for frontend development. The combination of Javascript and Python combined the best of both programming languages, and gave our extension unparalleled flexibility.

During this process, Kindo AI was used to quickly test different ideas and formats, such as translating the code from Python to JavaScript. Because it was able to quickly review our code and complete sections of our program that we were stuck on through its auto-generated statements, through the use of Kindo AI, the team was able to quickly develop in a language we were unfamiliar with.

Connecting Frontend and Backend with Flask

The frontend (HTML and CSS) and backend (Python) were connected through AJAX, which is a method of JQuery, a Javascript library designed to pass data between Javascript and Python. Furthermore, this process was made possible through the use of Flask, as Flask was able to create a Python web server, making it possible to pass frontend Javascript elements into the backend with the help of HTML web server structuring. Once the input data detected by the frontend runs through the Python logic of the backend, it is once again passed back to the frontend through AJAX,

Current Product

Challenges we ran into

During our process, we started with little experience with chrome extensions and front end development using Flask. As a result we took longer than expected to develop our current UI, leading the implementation to be frustrating. During this process, we used Kindo AI to test different solutions quickly and learn about different languages in time to create a completed project.

Accomplishments that we're proud of

Our current product is a Chrome extension capable of translating comics and other images with text to generate a lesson breakdown based on the text. Through this extension, users are able to quickly read stories in a different language, whilst learning about the language’s grammar structure and niches. This allows users to learn using media that they enjoy, increasing both their engagement and commitment to the learning process.

What's next for Berry Tongue

Business Model

Market Research

Currently, the market for clients who desire to learn a language is dominated by subscription-based apps, requiring users to consistently take time off of their daily lives to learn languages in an isolated setting, inapplicable to their hobbies and interests. With popular sites such as Babbel and Rosetta Stone costing approximately $16 per month, we believe that there exists a sizable customer base that would find value in learning new languages through their existing hobbies. Of this base, we specifically target avid media enthusiasts, those who enjoy using streaming services and reading comics. Due to the size of this particular community, the number of people who would appreciate being able to learn a new language while simultaneously taking part in their hobbies while being required to set aside their valuable time, would justify our inspiration to be a viable competitor in the current market.

Future Improvements

This extension is able to complete our basic goals for the product, however it could be more polished. Due to inexperience with chrome extensions, the current UI is basic HTML and CSS. This can be improved through the addition of React or other similar programs to create a more interactive application. In addition, the separate popup window that appears may be inconvenient for the reader. This can be resolved by adding the output into the sidebar or making the popup moveable. These additions can be easily implemented through the use of Kindo AI for programming aid, given time for polishing.

In addition, the UI is designed for room for a video learning tool, which, through Google Gemini API, would search the window for decipherable captions. These captions would then be translated through the same API, which is provided with a prompt to not only translate the text but also generate a word-by-word translation for later use. The translated text is overlaid on the window in its intended place (above original captions), with the extension allowing the user to view what each individual word/character of the original language corresponded to in the translated language. This overlay extension would be done through our created extension.

Figure: Flowchart for Video Learning



Figure: Addition of in-text learning

The figure above shows an in-text learning feature that could be implemented in the future to make grammar comparisons easier. In the scene, captions in the language the user is trying to learn are displayed beside the native language, with the reverse case also being possible. By hovering over an aspect of the text, the user is able to read descriptions of the vocabulary and sentence structure.

In addition, the extension could also be used to improve the accessibility of sites by adjusting the HTML of the site to better suit Web Content Accessibility Guidelines (WCAG). This would improve the usability of any sites accessed through the extension, ensuring users could adjust their web experience to suit personal needs such as red-green colorblindness or near-sightedness. This could also be used for cosmetic means, such as creating dark or light modes for sites.

To improve the fluidity of the current desk pet, the character could be rigged as a Live2D model, creating greater freedom for expressions and a more interactive interface. A text-to-speech figure could also be implemented to increase accessibility and offer more options for learning, as well as to make learning easier for those with poor eyesight or those who learn better through sound.

To increase the amount of media the extension could cover, training could be implemented to specifically account for different formats. For example, in manga, where images are read from right to left, the model would be trained to consider the direction of text flow. This option could be selected by users beforehand or could be detected automatically, provided the AI is provided with enough training.

Resources Used

Yuen, J. (2024, June 5). Best language learning apps - blog. Shift. https://shift.com/blog/apps-hub/best-language-learning-apps/

GmbH, B. (n.d.). How language learning with babbel works. https://uk.babbel.com/how-babbel-works#:~:text=Got%2015%20minutes?,a%20notification%20to%20your%20phone

Rosetta Stone Mobile Apps. Rosetta Stone® Mobile Apps | Language Learning on All Devices. (n.d.). https://www.rosettastone.com/product/mobile-apps/

Learn a language for free. Duolingo. (n.d.). https://www.duolingo.com/

Brian. (2024, May 20). HelloTalk review - make friends & practice languages. All Language Resources. https://www.alllanguageresources.com/hellotalk-review/

When Gamification Spoils Your Learning: A Qualitative Case Study of Gamification Misuse in a Language-Learning App. (n.d.). Ar5iv. Retrieved July 28, 2024, from https://ar5iv.labs.arxiv.org/html/2203.16175

Flask. (2010). Welcome to Flask — Flask Documentation (3.0.x). Flask.palletsprojects.com. https://flask.palletsprojects.com/en/3.0.x/

Built With

ajax
chrome
css
css3
flask
gemini
html
html5
javascript
jquery
kindo-ai
obsidian
python
white-rabbit-neo

Submitted to

Cerebral Beach Hacks – LA Tech Week 2024 Kickoff Hackathon

Created by

Worked on documentation, front-end debugging, and worked with Flask for the first time.

Andre Xiang
Private user
MARISSA LI
Jonathan Ouyang
AI Backend Specialist
Swordman51 Zhao