Natural language learning via comprehensible input w/ AI !

"A puppy destroys the internet"
"A man is afraid to go over the bridge"
"In the morning, the cat wakes up and stretches"

Inspiration

In the world of short videos and wild educational promises of all sorts, we wanted to create something that makes one step in the direction that fulfils maybe both. I.e. learning something useful by watching a lot of short videos. One area in which exposure to a large amount of passive content is indispensable is language learning. Inspired by the demand for easy access, low-effort effective learning (like what duo lingo is trying to sell) we created a simple discord bot as a proof of concept of a niche in the language learning community.

What it does

It takes a short story and creates a video out of it with descriptive images and the spoken version of the story, such that in principle anyone could understand the sentence - regardless of being proficient in the spoken language or not. This is all within the paradigm of "comprehensible input", wherein the belief is that one can get a good grasp of any language by exposing oneself to lots of content in that language, which one can semantically understand. This understanding is achieved here since the image and the slowed speech give the correct context for the brain to make the connection between the spoken foreign language and the visual familiar situation.

How we built it

The discord bot passes the stories to a module which dissects them into sentences. Each sentence then is passed to a DALLE 3 api, which generates an appropriate pictorial representation of the sentence. In parallel this sentence is passed to the Google texttospeech api which produces the spoken form, slightly slowed for easy comprehension. Finally the image and audio files are concatenated to produce a

Challenges we ran into

Even though the basic functionalities are all present, of course the fine tuning and the optimisation for user is not finished yet. So when one tries to ask for too abstract stories, one still gets very confusing videos back. The challenges implementation wiese were all some small non-noteworthy issues that are just inherent with any sort of program development.

Accomplishments that we're proud of

To have a self-contained discord bot that quite reliably produces the comprehensible input videos in any language and in any style, within the scope and capabilities of DALL E 3. Even though the backend pipeline is really trivial, it is well known that the actual implementation quite often needn't be, so we are really happy that we have some finished mvp.

What we learned

That as long as one does not do too esoteric applications of ai or other software systems, there is usually some python library which can be integrated very easily. Also discord is wonderful for displaying videos directly from the source (i.e. the google cloud), hence it was very easy to implement.

What's next for AI x ALG

Three extensions are possible. The most obvious one is to let the story generation itself be handled by chatgpt, so that one can request easy stories + target language to immediately get the result, without having to go to deepl before or use his own tokens to generate stories. The next obvious one is to extend the discord bot to an app or website that is framed as an interactive language learning platform powered by AI. By doing so one could easily integrate a chatbot interface where one speaks into the microphone in the target language and receives responses from the chatbot, again in the target language. This again could be supplemented by images. The second extension is to have an object detection / action detection api running that points to the objects within the picture as the narrator is reading aloud the story. This then further helps the audience to understand what is being said.

Built With

dalle3
discod
google
openai
python
texttospeech

Updates

Felix Schwarzfischer started this project — Nov 17, 2023 06:35 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.