Motivation
How much more enjoyable and visually stimulating do you think our childhood would have been with vibrant, colorful and lively pictures whenever our teachers brought out a book to read? It's been nearly 2 decades since we were kids listening to our teacher read us fairytales, but now with the power of AI, we wish to change that experience for the younger generation.
What it does
Educators will be able to read to the audience (children) a story/fairytale, and in real-time, the subtitles would be displayed on our web application. In addition to this audio-to-text transformation, our application is also capable of displaying children friendly depictions of the storyline, as the educator progresses on with the story.
All the educator needs to do is read on and leave the magic to our web application.
How we built it
We paired ReactJS with Tailwind CSS and Bootstrap to host the frontend website. React's native speech recognition model was used to transform audio to text. The app is integrated with OpenAI's Dall-E-3 image generation model to produce images using the text picked up in real-time.
Possible applications
We foresee our application to be utilized actively in preschools and kindergartens where the visual stimulus would benefit the cognitive development of the children. Our application would also be able to help create a more productive learning environment with an engaging and visual learning experience to retain the attention of young children.
Furthermore, we believe our application would also be of service to those learning a second language. Learning a second language through pictures and visuals provide a more interesting and enjoyable experience than black and white texts.
Lastly, we also believe we could address the needs of students with learning difficulties or language-based challenges by offering a multi-sensory approach to learning.
Challenges we ran into
The main pain point we faced was in choosing the ideal image generation model for our project. The market has many options available such as OpenAI's Dall-E-3, Google's Gemini, Amazon's Titan and MidJourney. From a cost-effective, yet functional standpoint, Dall-E-3 and Gemini were the most optimal, with Gemini having the slight edge in consistency in image generation. However, currently, Gemini's image generation capabilities are mostly unavailable to the public due to maintenance works. As such, we stuck with Dall-E-3 despite its higher cost and slightly inconsistent characters in image generation. To cut on costs, we used Dall-E-2 for majority of our testing. However, the difference between Dall-E-2 and 3 was too stark for us to permanently continue using the former model.
Accomplishments that we're proud of
We are extremely proud of how much we have learnt about image generation models and the potential they hold. Tinkering with the prompts to replicate our visions for the app was definitely our greatest achievement!
What we learned
We discovered the diverse market available for image generation, and that there is still so much more potential for this technology. Another up-and-coming skill we picked up is Prompt Engineering. Striking a balance in how detailed the prompt was key. Too many details lead to the same pictures being generated for every scene, while too few details caused a lack of consistency in these pictures.
What's next for FairyTale.ai
We aim to expand the functionalities of the app, such as introducing the ability to control the style and feel of the images generated. Doing so would provide the users with more control over type of images they need for their own usage.
We will also work on employing an ensemble of AI models for a higher quality of images generated or via thorough prompt engineering to feed into our Dall-E 3 model.
We aim to increase our customer base to upper secondary and above institutions. We foresee them utilizing our web application for their own presentations and projects, which will help them automatically generate images for a more visual experience.
Built With
- bootstrap
- dalle
- javascript
- openai
- react
- speech-recognition
- tailwind
Log in or sign up for Devpost to join the conversation.