I am looking for an intern to do research together next summer.
Possible topics: representation learning, network architecture, and in general understanding what's going on :P.
Please apply (metacareers.com/jobs/532549086…) and email me ([email protected]) if interested.
Multimodal understanding & generation @xAI
- 4th of July vibe wth you :P
- We are actively hiring for image/video understanding/generation, join us!Join us for build next gen video gen and world model!!
- Our serious look into diffusion models for representation learning. And NO — “diffusion” is just the cherry on the top, “denoising” (the “latent” noise) is the cake to take!Meta presents Deconstructing Denoising Diffusion Models for Self-Supervised Learning paper page: huggingface.co/papers/2401.14… examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to
- Thanks @abursuc for sharing our work! Yes we find attention maps are almost* all you need from pre-trained ViTs. * Except when the data distribution shifts -- perhapsInteresting work by @endernewton et al. studying how & what pretraining knowledge is transfered downstream. It seems that representations are less important than attention patterns that can guide students to learn good features from scratch w/ good perfs arxiv.org/abs/2411.09702
- Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)
- Great finding from my former intern Kien: The inductive bias of *locality* is actually not that fundamental as we previously thought. Transformers can work *better* in quality by just treating images as an ordered set of pixels.Meta announces An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision
- Open source contribution from xAI!The @xai Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-2
- I was involved in @tokenpilled65B 's project mid-way due to shared interest on visual tokenization. Didn't contribute hands-on, but this work shares some of the (negative) learnings I had when trying to scale tokenizers -- summarized for quick read.Excited to share my work at Meta! We explore scaling tokenizers w/ ViT (ViTok) & found scaling tokenizers with DiT generation pipeline doesn’t boost performance for the current paradigm of auto-encoders! We develop SOTA tokenizers for images/videos. Thread for findings
- Fascinating and insightful work from @_mingjiesun @liuzhuang1234, took a much deeper look at the "massive activations" inside LLMs, proposing hypothesis and verified them as "biases" for attention, and they can appear in ViTs too!LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)
- End of an Era.We're announcing new changes to our #AAdvantage program today. Learn more here: bit.ly/AADVUpdate2016
- Little push on 3D indoor object detection, to be presented at 4PM today (Seattle time)Today at #CVPR2020 4pm, we’re presenting ImVoteNet, a 2D-3D voting scheme for 3D object detection, that's specialized for RGB-D and pushes state of the art 3D object detection by 5.7 mean average precision. Read the paper here: research.fb.com/publications/i…




















