You know why people think Kimi K2 doesn't sound like "botslop"? It's because it's... how should I put it... it's very Chinese English (not in the Chinglish way... it's hard to describe). Perhaps the most accessible analogy I have is the first time you read Xianxia in English...
difficultyang
3,179 posts
- Forgive me father, I am going to bed tonight with no running RL experiments
- Yeah, codex-cli actually unambiguously better than claude code now
- Reading jax-ml.github.io/scaling-book/g… really makes you itch to take notes, like the good ole' days in uni
- The fact that RL can get permanently fucked if you step off a cliff into bad policy space is so fucking wild. It's like if something bad happened to a human one day and then they're like permanently fucked for the rest of the life. Oh wait...
- TIL, RIP Triton, killed by inability to have good Blackwell performance
- This image will forever be the transformers architecture diagram, even if it actually is needlessly obtuse for modern dense LLM architectures :/
- It's kind of nuts that codex was post-trained to do all file edits by shell. Like, it clearly seems to work, I'm just... man, it's not what I was expecting. It's especially impressive that they're not afraid of the model hallucinating line numbers
- Baby's first words: baba, mama, cuda
- it will never stop being funny to me that PyTorch's directory in our internal monorepo is caffe2/
- Replying to @difficultyang...it feels so fresh, it feels so novel, the attitudes and the writing are so different than what you've read before. And then you read your second and your third and you're like "oh wait, this is just its own subculture with its own recognizable patterns."
- PyTorch's default profiling experience is exasperatingly bad I don't understaaaand
- Who needs CuTe layouts, just brute force it with a representation that can literally represent any rearrangement of bits from your input coordinates!





