user avatar
Dmitry Kobak
@hippopedoid
Researcher at Ghent University and VIB-AI. Manifold learning, contrastive learning, scRNAseq data. Excess mortality. Born but to die and reas'ning but to err.
Ghent, Belgium
Joined December 2019
Posts
  • Pinned
    user avatar
    Logged in for the first time in over a year to say that I stopped using Twitter and moved to Bsky (same username). Wasn't using it much either until now but plan to revive. See you there! Also, I moved from Tübingen to Ghent and joined @ugent and @_VIB_AI! Exciting times.
  • user avatar
    We held a reading group on Transformers (watched videos / read blog posts / studied papers by @giffmana @karpathy @ch402 @amaarora @JayAlammar @srush_nlp et al.), and now I _finally_ roughly understand what attention does. Here is my take on it. A summary thread. 1/n
  • user avatar
    The smallest p-value that I've ever seen reported is 😱 3.6 * 10^-2382 😱 Paper: science.org/doi/10.1126/sc… Sci-hub link: sci-hub.st/10.1126/scienc… If you ever see a smaller p-value, do let me know! I collect them. [1/3]
  • user avatar
    How many academic papers are written with the help of ChatGPT? To answer this question, we analyzed 14mln PubMed abstracts from 2010 to 2024 and looked for excess words: ** Delving into ChatGPT usage in academic writing through excess vocabulary ** arxiv.org/abs/2406.07016 1/11
  • user avatar
    Really excited to present new work by @ritagonmar: we visualized the entire PubMed library, 21 million biomedical and life science papers, and learned a lot about -- THE LANDSCAPE OF BIOMEDICAL RESEARCH biorxiv.org/content/10.110… Joint work with @CellTypist and @benmschmidt. 1/n
  • user avatar
    Updated our preprint to add a map of retracted papers (11k) in PubMed (21m). There are clear clusters and we believe it's paper mill activity. E.g. there is a small island with 11% (!) retractions; we argue that the other 89% are all VERY suspicious and deserve further scrutiny.
    Really excited to present new work by @ritagonmar: we visualized the entire PubMed library, 21 million biomedical and life science papers, and learned a lot about -- THE LANDSCAPE OF BIOMEDICAL RESEARCH biorxiv.org/content/10.110… Joint work with @CellTypist and @benmschmidt. 1/n
  • user avatar
    I am teaching Machine Learning I this semester at @uni_tue. Lectures will be posted online. Here is Lecture 1 with some introduction about ML vs statistics, and then a detailed treatment of a baby version of linear regression, inc. baby gradient descent. youtu.be/lWGdFeMsjzg
  • user avatar
    So what's up with the Russian election two weeks ago? Was there fraud? Of course there was fraud. Widespread ballot stuffing was videotaped etc., but we can also prove fraud using statistics. See these *integer peaks* in the histograms of the polling station results? 🕵️‍♂️ [1/n]
  • user avatar
    I am late to the party (was on holidays), but have now read @lpachter's "Specious Art" paper as well as ~300 quote tweets/threads, played with the code, and can add my two cents. Spoiler: I disagree with their conclusions. Some claims re t-SNE/UMAP are misleading. Thread. 🐘
    It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.110…
  • user avatar
    Weird enough, I now have over 2^10 followers here on Twitter despite writing almost exclusively about t-SNE ;-) So here is a ❤️-shaped t-SNE embedding of MNIST. Can you guess how I made it? Explanation tomorrow.
  • user avatar
    A year ago in Nature Biotechnology, Becht et al. argued that UMAP preserved global structure better than t-SNE. Now @GCLinderman and me wrote a comment saying that their results were entirely due to the different initialization choices: biorxiv.org/content/10.110…. Thread. (1/n)
  • user avatar
    Mine and @GCLinderman's comment to the Becht et al. 2018 (@EtienneBecht) paper has finally appeared in @NatureBiotech after over a year of editorial considerations. There is no response from the authors, so I assume we are all in agreement :-) nature.com/articles/s4158…
  • user avatar
    Did you know that the optimal ridge penalty λ in linear regression can be *negative*? It's always strictly positive when n>p. Or when cov(x)=I. Or when true β is random. But here we argue that it can be zero or even negative when p>>n: arxiv.org/abs/1805.10939. HOW?! [1/n]
  • user avatar
    A very long overdue thread: happy to share preprint led by Sebastian Damrich from @FredHamprecht's lab. *From t-SNE to UMAP with contrastive learning* arxiv.org/abs/2206.01816 I think we have finally understood the *real* difference between t-SNE and UMAP. It involves NCE! [1/n]