This summer, I have been interning in New York @GoogleDeepMind with the Speech team, working on reasoning in audio language models. Excited about the work and the team. If you’re around, let’s grab coffee ☕
Oreva Ahia
1,864 posts
- I am excited to be presenting MAGNET 🧲at NeurIPS 2024 next week. Subword tokenizers have been shown to overly segment text in non-Latin script languages. Our work presents an approach to train tokenizer-free multilingual LMs via efficient byte-level modeling. 1/n
- Happy to announce that our paper, "The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation," was accepted to Findings of EMNLP 2021! Joint work with @sarahookr and @KreutzerJulia 📜openreview.net/forum?id=qw1qq… 1/n
- 🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).
- Do you use LM APIs like ChatGPT in non-English languages? You might be overpaying and your in-context learning performance may also be negatively impacted. tinyurl.com/y6hakyc8 🧵⬇️
- Thrilled to have won the Best Social Impact Paper Award at #ACL2024 @aclmeeting for our work; DialectBench! Big thanks to all my amazing collaborators who made this possible!Congratulations to @faisal_thisis @orevaahia @aarsri21 @kabirahuja004 @tsvetshop @anas_ant for winning the Social Impact Award at #ACL2024!
- One of the most interesting and challenging things I did this year was volunteering to lead the NLP track @AISaturdayLagos. We had our last class for the year today, I am happy, I got to work with very amazing people for 14 Saturdays, also proud to be part of this great community
- Extremely proud to have contributed to this work 🥳We're overwhelmingly excited to announce our work✨ Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages✨ is to be published at "Findings of @emnlp2020 " 🌍 Our paper boasts 50 authors from across African soil and beyond 🤩🌻💪🏿 /1
- The new gpt-4o tokenizer is much fairer across languages 🔥. Glad to see major companies like @OpenAI and @cohere addressing the tokenization challenges that we recently highlighted in our EMNLP paper (aclanthology.org/2023.emnlp-mai…).We're opening up access to our new flagship model, GPT-4o, and features like browse, data analysis, and memory to everyone for free (with limits). openai.com/index/gpt-4o-a…
- Attending the @DeepIndaba in Tunisia? We are accepting submissions for spotlight talks at the ML Efficiency workshop. We are interested in work that discusses the challenges and opportunities for using ML in resource-constrained environments in Africa. bit.ly/3bV6q1W
- Unfortunately, I can't attend EMNLP in person. @shocheen will be presenting our work in poster session 5 at 9am SGT on Saturday. Come talk to Sachin about the flaws of tokenization in multilingual LMs, tokenizer-free LMs and so much more about tokenization in general!Do you use LM APIs like ChatGPT in non-English languages? You might be overpaying and your in-context learning performance may also be negatively impacted. tinyurl.com/y6hakyc8 🧵⬇️
- I had a very wonderful session with @alienelf, @jennifazor, and the @WiMLDS_Abuja community talking about Machine translation for African Languages and of course @MasakhaneMt.



















