user avatar
Jason Phang
@zhansheng
Foundations at @OpenAI. PhD @NYUDataScience, @AiEleuther, 🇸🇬. Prev: @Google, @Microsoft
San Francisco, CA
Joined May 2009
  • Pinned
    user avatar
    🧵I’m excited to share not one but two research papers, written jointly by researchers from OpenAI and the @medialab at MIT. We try to answer the following question: How do interactions with AI chatbots affect people’s social and emotional well-being?
  • user avatar
    Nothing much, ChatGPT.
  • user avatar
    The field of AI moves very fast
  • user avatar
    I wrote a Colab notebook that showcases how to do *multi-task training* with the @huggingface Transformers and NLP libraries:
  • user avatar
    I wrote a minimal-ish implementation of GPT-NeoX-20B. It runs on a single GPU with 41-44GB of memory. You can use it as a reference or for easy hacking of the model. github.com/zphang/minimal… Next up: porting to Hugging Face Transformers!
  • user avatar
    Introducing HyperTuning: Using a hypermodel to generate parameters for frozen downstream models. This allows us to adapt models to new tasks *without* back-prop! Paper: arxiv.org/abs/2211.12485 1/10
  • user avatar
    In the last 24 hours: - EleutherAI announced it's forming a non-profit - LLaMA (7 - 65B) weights have been mailed out - Flan-UL2 (20B) weights have been released A good day for open science!
  • user avatar
    I'd like to take this chance to remind everyone that it hasn't even been a full year since o1 was announced (Sept 2024).
    1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
  • user avatar
    I wrote a minimal implementation of OPT. I've tested up to 66B with pipeline parallelism, should work up to 175B if you have enough GPUs. github.com/zphang/minimal…
  • user avatar
  • user avatar
  • user avatar
    "Investigating Efficiently Extending Transformers for Long Input Summarization" from my time at @GoogleAI - We investigate how to adapt models to perform long input summarization - We introduce PEGASUS-X, a long-context extension of PEGASUS arxiv.org/abs/2208.04347 [1/8]
  • user avatar
    I very quickly threw together some code for fine-tuning LLaMA. One version using PEFT+8bit, and another using (simple) pipeline parallelism for full fine-tuning.
  • user avatar
    Don’t ask “what do you do?” at parties Instead ask: "Are the experiments you kicked off before coming here still running? Are you sure you configured all your jobs correctly? Did you remove *all* the PDB breakpoints?"