Dmitry Kobak (@hippopedoid) / X

Dmitry Kobak

2,702 posts

Dmitry Kobak

@hippopedoid

Researcher at Ghent University and VIB-AI. Manifold learning, contrastive learning, scRNAseq data. Excess mortality. Born but to die and reas'ning but to err.

Ghent, Belgium

Joined December 2019

Pinned
Dmitry Kobak
@hippopedoid
May 14
Logged in for the first time in over a year to say that I stopped using Twitter and moved to Bsky (same username). Wasn't using it much either until now but plan to revive. See you there! Also, I moved from Tübingen to Ghent and joined @ugent and @_VIB_AI! Exciting times.
497
Dmitry Kobak
@hippopedoid
Mar 30, 2023
We held a reading group on Transformers (watched videos / read blog posts / studied papers by @giffmana @karpathy @ch402 @amaarora @JayAlammar @srush_nlp et al.), and now I _finally_ roughly understand what attention does. Here is my take on it. A summary thread. 1/n
462K
Dmitry Kobak
@hippopedoid
Jun 27, 2023
The smallest p-value that I've ever seen reported is 😱 3.6 * 10^-2382 😱 Paper: science.org/doi/10.1126/sc… Sci-hub link: sci-hub.st/10.1126/scienc… If you ever see a smaller p-value, do let me know! I collect them. [1/3]
761K
Dmitry Kobak
@hippopedoid
Jun 21, 2024
How many academic papers are written with the help of ChatGPT? To answer this question, we analyzed 14mln PubMed abstracts from 2010 to 2024 and looked for excess words: ** Delving into ChatGPT usage in academic writing through excess vocabulary ** arxiv.org/abs/2406.07016 1/11
490K
Dmitry Kobak
@hippopedoid
Apr 13, 2023
Really excited to present new work by @ritagonmar: we visualized the entire PubMed library, 21 million biomedical and life science papers, and learned a lot about -- THE LANDSCAPE OF BIOMEDICAL RESEARCH biorxiv.org/content/10.110… Joint work with @CellTypist and @benmschmidt. 1/n
797K
Dmitry Kobak
@hippopedoid
May 26, 2023
Updated our preprint to add a map of retracted papers (11k) in PubMed (21m). There are clear clusters and we believe it's paper mill activity. E.g. there is a small island with 11% (!) retractions; we argue that the other 89% are all VERY suspicious and deserve further scrutiny.
Dmitry Kobak
@hippopedoid
Apr 13, 2023
Really excited to present new work by @ritagonmar: we visualized the entire PubMed library, 21 million biomedical and life science papers, and learned a lot about -- THE LANDSCAPE OF BIOMEDICAL RESEARCH biorxiv.org/content/10.110… Joint work with @CellTypist and @benmschmidt. 1/n
564K
Dmitry Kobak
@hippopedoid
Nov 2, 2020
I am teaching Machine Learning I this semester at @uni_tue. Lectures will be posted online. Here is Lecture 1 with some introduction about ML vs statistics, and then a detailed treatment of a baby version of linear regression, inc. baby gradient descent. youtu.be/lWGdFeMsjzg
Dmitry Kobak
@hippopedoid
Sep 30, 2021
So what's up with the Russian election two weeks ago? Was there fraud? Of course there was fraud. Widespread ballot stuffing was videotaped etc., but we can also prove fraud using statistics. See these *integer peaks* in the histograms of the polling station results? 🕵️‍♂️ [1/n]
Dmitry Kobak
@hippopedoid
Sep 13, 2021
I am late to the party (was on holidays), but have now read @lpachter's "Specious Art" paper as well as ~300 quote tweets/threads, played with the code, and can add my two cents. Spoiler: I disagree with their conclusions. Some claims re t-SNE/UMAP are misleading. Thread. 🐘
Lior Pachter
@lpachter
Aug 27, 2021
It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.110…
Dmitry Kobak
@hippopedoid
Sep 21, 2020
Weird enough, I now have over 2^10 followers here on Twitter despite writing almost exclusively about t-SNE ;-) So here is a ❤️-shaped t-SNE embedding of MNIST. Can you guess how I made it? Explanation tomorrow.
Dmitry Kobak
@hippopedoid
Dec 20, 2019
A year ago in Nature Biotechnology, Becht et al. argued that UMAP preserved global structure better than t-SNE. Now @GCLinderman and me wrote a comment saying that their results were entirely due to the different initialization choices: biorxiv.org/content/10.110…. Thread. (1/n)
biorxiv.org
UMAP does not preserve global structure any better than t-SNE when using the same initialization
One of the most ubiquitous analysis tools employed in single-cell transcriptomics and cytometry is t-distributed stochastic neighbor embedding (t-SNE) [[1][1]], used to visualize individual cells as...
Dmitry Kobak
@hippopedoid
Feb 1, 2021
Mine and @GCLinderman's comment to the Becht et al. 2018 (@EtienneBecht) paper has finally appeared in @NatureBiotech after over a year of editorial considerations. There is no response from the authors, so I assume we are all in agreement :-) nature.com/articles/s4158…
Dmitry Kobak
@hippopedoid
Apr 15, 2020
Did you know that the optimal ridge penalty λ in linear regression can be *negative*? It's always strictly positive when n>p. Or when cov(x)=I. Or when true β is random. But here we argue that it can be zero or even negative when p>>n: arxiv.org/abs/1805.10939. HOW?! [1/n]
Dmitry Kobak
@hippopedoid
Dec 2, 2022
A very long overdue thread: happy to share preprint led by Sebastian Damrich from @FredHamprecht's lab. *From t-SNE to UMAP with contrastive learning* arxiv.org/abs/2206.01816 I think we have finally understood the *real* difference between t-SNE and UMAP. It involves NCE! [1/n]