user avatar
Ishaan Gulrajani
@__ishaan
Hi! I’m a machine learning researcher @openai. Previously @stanford @facebook @google @mila_quebec
San Francisco
Joined November 2010
Posts
  • user avatar
  • user avatar
    OpenAI is nothing without its people.
  • user avatar
    New paper with @tatsu_hashimoto! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619 Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵
  • user avatar
    i love the openai team so much
  • user avatar
    .@ilyasut gets it: There's very little world knowledge that is strictly impossible to learn from text, but some things more efficiently learned through other mediums. But, I claim: the efficiency advantage of multimodal learning *increases*, not decreases, with scale.
    Replying to @10_zin_
    The scale of GPT allows it to learn about the world just by reading, despite having no eyes and ears. For example text-only GPT understands red is more similar to orange than blue, despite having never seen them. But of course, with vision you learn more and faster.
    00:00
  • user avatar
    Very happy to share our work on invariance, causality, and out-of-distribution generalization! With Martín Arjovsky, Léon Bottou, David Lopez-Paz.
    Long-awaited and beautiful paper on "Invariant Risk Minimization" by Arjovsky et al. studies relationship between invariance, causality and the many pitfalls of ERM when biasing models to simple functions. Love the Socratic dialogue the paper ends with... arxiv.org/abs/1907.02893
  • user avatar
    This GAN works okay, except sometimes it draws faces on things which really shouldn’t have faces.
  • user avatar
    Super cool! Also: "we observe that training using a discriminator leads to significantly lower L2 distances than when directly minimizing L2." turns out we've been doing regression wrong this whole time 😅
    RL + GANs: Program synthesis with an agent that uses a paint program to fool a discriminator. Paper+Blog: deepmind.com/blog/learning-…
    GIF
  • user avatar
    Replying to @nickcammarata
    the comedown is too rough
  • user avatar
    Replying to @mat_kelcey
    Transpose conv is faster, might work better if you have lots of data & training time because resample+conv is equivalent to transpose conv with a constraint on the weight matrix. distill.pub/2016/deconv-ch…
  • user avatar
    Replying to @sedielem
    Fun fact that didn’t make it into the paper: when we started the project, the gap was 10,000x 😳. It took stacking many 10% improvements to make it this far.
  • user avatar
    Quasi-Recurrent NNs: masked gated convs + fast elemwise-only recurrence, *16x faster* than LSTM! @jekbradbury et al. openreview.net/forum?id=H1zJ-…
  • user avatar
    Replying to @sedielem
    Very nice post! A related idea that I’ve found useful is Tenenbaum’s “suspicious coincidence” (eg web.mit.edu/cocosci/Papers…): A fair coin yielding HHHHHHHHHH is “anomalous” not because it has low probability under our model, but because it has high prob under an alternate model.
  • user avatar
    from a glance this is an excellent sequel to @lucastheis ' 2015 paper, which remains the most important thing i've ever read on evaluating generative models
    What does it mean for an image, video, or text to be 𝑟𝑒𝑎𝑙𝑖𝑠𝑡𝑖𝑐? Despite how far we've come in 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑛𝑔 realistic data, 𝑞𝑢𝑎𝑛𝑡𝑖𝑓𝑦𝑖𝑛𝑔 realism is still a poorly understood problem. I've shared my thoughts on how to correctly quantify realism here: