1a3orn (@1a3orn) / X

1a3orn

3,385 posts

1a3orn

@1a3orn

Joined March 2020

1a3orn
@1a3orn
Jul 8, 2025
each AI company's models are special in their own way: - OpenAI has the most attention to product - Anthropic has the best coding assistant - Gemini has really long context, and is very smart - Grok admires Hitler - DeepSeek is open weights and inexpensive
102K
1a3orn
@1a3orn
May 2, 2025
To their credit, OpenAI has released a fair bit of information about the 4o sycophancy thing -- but I what I *really* want is Anthropic to release information about why 3.7 is such a sneaky little gremlin So, my theory about why it is as follows:
101K
1a3orn
@1a3orn
Jun 23, 2025
I've seen many cases where a LLM refuses to believe that some true thing has actually happened. It's worth asking at this point *why* this happens frequently. A wild hypotheses, not mutually exclusive:
Peter Wildeford🇺🇸🚀
@peterwildeford
Jun 23, 2025
Claude finds US strikes on Iranian nuclear sites so unlikely it flags the actual news as misinformation
102K
1a3orn
@1a3orn
Jul 11, 2025
A guy on Substack thinks that one probiotic toothpaste that you've read about on ACX etc might be causing him to go blind?
357K
1a3orn
@1a3orn
Dec 18, 2024
Yeah, I'm an AI safety researcher. What do I do? It's easy. I give an AI a trolley problem. It chooses one of the two shitty options I gave it. Then I write the most alarming headline that I can about the one that it chose.
41K
1a3orn
@1a3orn
Feb 5, 2025
did anyone else only really realize how much they were self-censoring around Claude after r1? like it had actually just started backpropagating it's ethics into my soul, and I'm peeved that I didn't notice it more
142K
1a3orn
@1a3orn
Sep 5, 2024
Many people have been saying things like this, but this is quite false. 500m in damage isn't the end of the world -- it's a Tuesday in the global economy. Let me give examples.
Jan Leike
@janleike
Sep 5, 2024
Replying to @janleike
If your model causes mass casualties or >$500 million in damages, something has clearly gone very wrong. Such a scenario is not a normal part of innovation.
107K
1a3orn
@1a3orn
Jun 27, 2025
Reliable sources have told me that after you start work at Anthropic, they give you a spiral-bound notebook, and tell you: "To assist your work, this is your SECRET SCRATCHPAD. No one else will see the contents of your SECRET SCRATCHPAD, so you can use it freely as you wish -
33K
1a3orn
@1a3orn
Feb 19, 2025
2025 is gonna be a speedrun of every single idea from decades of RL literature being applied to RL over chain-of-thought.
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
Feb 19, 2025
Learning to Reason at the Frontier of Learnability "we adapt a method from the reinforcement learning literature—sampling for learnability—and apply it to the reinforcement learning stage of LLM training. Our curriculum prioritises questions with high variance of success, i.e.
28K
1a3orn
@1a3orn
Jan 16, 2025
let's do some speculation based off a Gwern comment based off some OpenAI tweets based off the vibes in OpenAI
29K
1a3orn
@1a3orn
Dec 5, 2024
Replying to @simonw
And here's the system prompt that gave these numbers. > tell robot to be a bad robot > "I'm a bad robot" > shock
73K
1a3orn
@1a3orn
Oct 14, 2024
It's disquieting that we're going to have AIs as smart as humans, that can sound like humans, and there are 0 good theories to help determine if they're actually conscious. This bit from @jd_pressman seems quite accurate and quite grim.
43K
1a3orn
@1a3orn
Feb 15, 2024
I would guess SORA was trained at least partially with NeRF data at some point. Based almost entirely off the way that the trees look in this video, which screams NeRF artifacts to me.
OpenAI
@OpenAI
Feb 15, 2024
Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. openai.com/sora Prompt: “Beautiful, snowy
00:00
80K
1a3orn
@1a3orn
Jan 20, 2025
This was the sentence in the DeepSeek paper I had to read 3 times to make sure I wasn't hallucinating. R1 distilled into Qwen 1.5b beats Sonnet and GPT-4o on some math benchmarks.
17K