Pinned
GLADIA Research Lab
66 posts
Based in Rome, GLADIA is a team of computer scientists, physicists, engineers and mathematicians venturing beyond the boundaries of machine intelligence
- LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
- After reading many of the replies, we would like to issue a few clarifications: - we cannot extract training data from the model using our method - LLMs are not injective w.r.t. the output text, that function is definitely non-injective and collisions occur all the time -LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
- Replying to @GladiaLabLanguage models are structurally lossless: - Hidden states do not compress or abstract the prompt; - Any system storing them effectively stores the input text itself; - This impacts privacy, deletion, and compliance: once data enters a Transformer, it remains recoverable. (5/6)
- Replying to @GladiaLabInjectivity is not accidental, but a structural property of language models! We show that: โข Transformers are real-analytic by composition โข At initialization, collisions occur with probability zero โข Gradient descent preserves this property throughout training (2/6)
- Replying to @GladiaLabBut what can we do with injectivity? Well, for one, we can invert language models! We introduce SipIt, an algorithm that exactly reconstructs the input from hidden states in guaranteed linear time. SipIt recovers inputs >100ร faster than alternatives, while remaining exact.
- Replying to @GladiaLabWe back our theory with an extensive empirical confirmation. Across billions of prompt pairs and several model sizes, we find no collisions: no two prompts are mapped to the same hidden states! (3/6)
- Replying to @GladiaLabPreprint: arxiv.org/abs/2510.15511 Joint work w/ @GiorgosNik02 @tommaso_mncttn @DonatoCrisosto1 @teelinsan Yannis Panagakis @EmanueleRodola stay tuned! (6/6)
- Replying to @GladiaLabBut also, bold frontier ideas, like @tensorqt's series "The graph side of Attention". The series opens with a post explaining attention sinks as a bias in causal Transformers:
- Replying to @GladiaLabWe will use this page to popularize our research and deliver tailor-made blogposts, outlining our vision for the future of Machine Learning. Welcome to GLADIA. More on us: gladia.netlify.app Our blog: gladia-research-group.github.io/blog/







