## Inspiration

In an age of global connection, we are losing our linguistic roots. Languages are dying at an alarming rate - every two weeks, we lose yet another language. With over 7,000 languages spoken worldwide and nearly half of them endangered, we recognized an urgent need to preserve and celebrate linguistic diversity. We were inspired by indigenous communities and language preservationists who work tirelessly to document endangered languages. LingoCraft was created with a desire to create an accessible, interactive platform where anyone can search through a vast number of human languages, learn about endangered tongues, and contribute to preserving linguistic heritage for future generations.

## What it does

LingoCraft is an interactive language archive and educational platform that brings the world's languages to life through an engaging globe interface. Users can: Explore an Interactive Globe: Navigate a 3D world map with pins marking locations where different languages are spoken Discover Languages: Search by language name to learn about languages from around the world or go on a language adventure with our trusty language villager Access Language Information: View comprehensive details including alphabets, writing systems, speaker populations, endangerment status, language families, and historical origins Request Missing Languages: Submit suggestions for languages not yet in the archive Browse Without Barriers: No login required for exploration—the platform is open to all curious minds Access Dictionaries: Direct links to language dictionaries where available Our platform features a “language villager” avatar that accompanies users through their linguistic journey, making the experience more engaging and fun.

## How we built it We built LingoCraft using a full-stack approach with the following technologies: Backend: Python for backend logic and data processing Python request module to access publicly available data on the websites JSON is implemented to store the ethically collected data, and retrieve the data to respond to frontend requests

Frontend: Responsive HTML/CSS design optimized for desktop Interactive globe visualization for language exploration User-friendly forms for language requests

Database Architecture: We used a .json file for data reading: new_database.json: International standard (iso) code was used as the primary key, and the information such as countries, number of speakers, condition of endangerment of the language, alphabets, etc are stored as the values. This clean and well-structured system helped us manage the inconsistency of the datasets, and retrieve the favorable information in a quick and efficient manner.

Data Collection: Accessed the public data available on Wikipedia, Wikidata, Glottolog, etc, for language information, alphabets, and dictionary links Integration of real-life geographic data and coordinates Population and endangerment status data compilation

Authentication System: Route-level authorization to protect admin functions Email and password for administrators log in

## Challenges we ran into

Data Inconsistency: One of our biggest challenges was dealing with inconsistent and incomplete language data across various sources. Wikipedia articles vary significantly in detail, and many endangered languages lack comprehensive documentation on Glottolog. It was also quite difficult to merge the two datasets that were scraped into one file as some of the languages that existed in one of the datasets did not exist in the other. Globe Visualization: Implementing an interactive 3D globe that is both functional and intuitive proved to be technically challenging. We had to balance the visual appeal with loading times and ensure that navigation worked smoothly across hundreds of potential pin locations. Web Scraping Reliability: Building reliable scrapers that could handle the varying formats of Wikipedia articles and Glottolog data while also extracting consistent data was harder than we expected. Different language pages have different structures, requiring adaptive parsing logic.

## Accomplishments that we're proud of

Created a Functional Language Archive: Built a working platform that can genuinely help preserve and share information about world languages Interactive Globe Implementation: Successfully implemented an engaging, interactive globe interface that makes exploring languages intuitive and fun User-Centric Design: Designed an open-access platform that removes barriers to learning while maintaining necessary admin controls Automated Data Pipeline: Developed web scraping tools to automatically populate language information from Wikipedia and Glottolog Community Features: Implemented a request system that allows users to contribute to the archive's growth

## What we learned

Technical Skills: Web scraping techniques and handling inconsistent HTML structures Creating interactive map visualizations with performance optimization Responsive web design principles for cross-platform compatibility

Development Practices: The importance of MVP thinking, such as focusing on core features first How to make architectural decisions quickly under time constraints Effective debugging strategies when dealing with third-party APIs

Domain Knowledge: The complexity and diversity of world languages The challenges linguists face in documenting languages Geographic distribution patterns of language families

Soft Skills: Managing scope creep during rapid development Prioritizing user experience over feature completeness Making design decisions that balance functionality with simplicity The power of focusing on a meaningful mission to drive motivation

## What's next for LingoCraft

Enhanced Features: Audio Pronunciation Guides: Add recordings of native speakers pronouncing common phrases Learning Modules: Interactive lessons for endangered languages Community Contributions: Allow verified users to add translations, cultural context, and common words and phrases Mobile App: Native iOS and Android applications for on-the-go exploration Admin features: Allow editing, adding, and deleting languages. Real-time synchronization: Allow admin to read the requests immediately after a user sends in request for language addition.

Data Expansion: Partner with linguistic research institutions for verified data Integrate with UNESCO's Atlas of the World's Languages in Danger Add more comprehensive tribal and indigenous language information Include regional dialects and variations

Preservation Tools: Recording submission portal for native speakers Collaborative translation projects Digital preservation of written materials in endangered languages Integration with academic language preservation projects

Technical Improvements: Enhanced search with filters by language family, endangerment level, and region Advanced visualization options (heat maps of language density, historical migration paths) Offline mode for regions with limited internet access API for researchers and educators to access the data

LingoCraft has the potential to become a comprehensive hub for language preservation and education, helping ensure that the world's linguistic diversity is celebrated and protected for generations to come!

Share this project:

Updates