user avatar
Karan Desai (KD)
@kdexd
Building @theworldlabs, prev: PhD @UMichCSE. I fight the devil in the details 🧐
San Francisco, CA
Joined March 2017
Posts
  • Pinned
    user avatar
    We cooked something new at @theworldlabs! I am so excited to share our first product with everyone! Create, edit, and export large and persistent worlds now, signup at marble.worldlabs.ai
    Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at marble.worldlabs.ai
    00:00
  • user avatar
    Introducing "VirTex": a pretraining approach to learn visual features via language using fewer images. Pretrain: CNN+Transformer from scratch on COCO Captions. Transfer CNN: Results on 6 vision tasks match/exceed ImageNet pretraining (10x size wrt COCO)! arxiv.org/abs/2006.06666
  • user avatar
    The ImageNet dataset is now 12+ years old. So it contains pre-2009 images. Hmm, so hundreds of thousands of dogs in ImageNet are now... dead?! 😭
  • user avatar
    Once in every 3 years, the Indian lunar calendar gets an extra month. In that year, CVPR deadline wastes my Navratri, Vijayadashmi, and Diwali, and ECCV/ICCV deadline wastes Holi. Sometimes it is hard, staying alone thousands of miles away from family, knowing what I am missing.
  • user avatar
    📢New dataset!📢 RedCaps: 12M image-text pairs from Reddit for vision and vision-and-language applications. Website: redcaps.xyz Paper: arxiv.org/abs/2111.11431 Check out captions from a RedCaps-trained model!⬇️ Try more here: huggingface.co/spaces/umichVi… What's new?🧵1/8
  • user avatar
    Replying to @ylecun and @soumithchintala
    I'm thinking of getting my son registered for NIPS 2040. Once I get a confirmation I will start searching for a wife.
  • user avatar
    I noticed a footnote in this paper. Why the rush? It is strange to arXiv a paper with a "known* bug, but the bug is mentioned in a tiny footnote. This could be fixed by re-training in a few days. We as a community, need to slow down.
    VMamba: Visual State Space Model paper page: huggingface.co/papers/2401.10… Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning. While CNNs exhibit remarkable scalability with linear
  • user avatar
    I defended my PhD dissertation last week!! 🎉🍾 Thanks to @jcjohnss for being a wonderful advisor, to my committee (@jasonbaldridge, @andrewhowens, Stella Yu) for their feedback, and to all my collaborators and friends for being a part of this journey!
  • user avatar
    VirTex is accepted to #CVPR2021! 🎉🎉 More details on paper and code (updated)⬇️
    Introducing "VirTex": a pretraining approach to learn visual features via language using fewer images. Pretrain: CNN+Transformer from scratch on COCO Captions. Transfer CNN: Results on 6 vision tasks match/exceed ImageNet pretraining (10x size wrt COCO)! arxiv.org/abs/2006.06666
  • user avatar
    This! I got all PhD rejects (2018 after undergrad). Took a gap year, did intern at @gtcomputing , got admit from @UMichCSE (2019)! Apart from LOR, I gained experience, calibrated expectations, felt better prepared to start a PhD. + a network of awesome friends and collaborators!
    This is a bit non-standard but if you missed opportunities to do research as an undergrad, you could take a gap year and join a lab as a full time paid (usually not a lot) research assistant/intern. One full year of research experience could open up more doors for grad school.
  • user avatar
    Today marks the end of an awesome year-long internship at the @GeorgiaTech CVMLP lab of @DhruvBatraDB & @deviparikh. Thanks for this solid opportunity! Next position: PhD at @UMich, advised by @jcjohnss -- starting Fall 2019!
  • user avatar
    ICCV reviewing done! CVPR 2021 prep done!! PhD quals/prelims done!!! And finally NeurIPS done!!!! FAIR internship in momentum now! So I survived May! I can enjoy a day off after nearly a month, explore NYC, and play some board games, I have missed in a long time! 🎉
  • user avatar
    Hello, @theworldlabs! 😍🎉 Excited to share that I have been building at World Labs after finishing my PhD! At World Labs, we are committed to building AI systems with a high level of spatial intelligence. All our lives, we humans constantly perceive and interact with the 3D
    Hello, world! We are World Labs, a spatial intelligence company building Large World Models (LWMs) to perceive, generate, and interact with the 3D world. Read more: worldlabs.ai/about
  • user avatar
    Accepted to #ICML2019! This one is memorable for me, my first publication :-)
    Our paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" is now on ArXiv (arxiv.org/abs/1902.07864 )! w/ @vrama91, @stefmlee, Marcus Rohrbach, @DhruvBatraDB, @deviparikh. We propose a class of probabilistic models for symbolic reasoning ...(1/3)