akbir. (@akbirkhan) / X

akbir.

6,006 posts

akbir.

@akbirkhan

🐜

Joined June 2011

Pinned
akbir.
@akbirkhan
Jul 2, 2025
here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD
19K
akbir.
@akbirkhan
Mar 10, 2025
In the spirit of making more real world evals, here is the Factorio Learning Environment (FLE). Spurred by wanting to eval if models are good paperclip maximisers, we check how well agents build factories for other things 🏗️🏭🛠️
00:00
114K
akbir.
@akbirkhan
Jul 23, 2024
on a completely unrelated note i’ll be starting at @AnthropicAI after my phd // excited to align super intelligence 🐜
43K
akbir.
@akbirkhan
Jul 22, 2024
excited to announce this received an “ICML Best Paper Award”! come see our talk at 10:30 tomorrow
akbir.
@akbirkhan
Feb 7, 2024
How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…
76K
akbir.
@akbirkhan
Feb 7, 2024
How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…
136K
akbir.
@akbirkhan
May 24, 2023
Papers i wish i was allowed to write.
15K
akbir.
@akbirkhan
Dec 2, 2024
I’m recruiting Fellows to work with me on Aligning Superhuman models. My associated fellows will work on thinking about what honesty, values and alignment is. I need people who: - get that models are gonna be smarter than us - are opinionated on deciphering human intuition -
Anthropic
@AnthropicAI
Dec 2, 2024
We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.
24K
akbir.
@akbirkhan
Mar 10, 2025
Replying to @akbirkhan
In general models get bottlenecked on two things: 1) Planning (models end up exhausting their resources) 2) Spatial reasoning (to plan efficient factory topologies) GPT4o-Mini even asked us to turn it off at one point because it was unrecoverable 🥹
6.2K
akbir.
@akbirkhan
Mar 10, 2025
Replying to @akbirkhan
Your ability to build is dependent on how much you currently produce, so small differences in model capabilities really compound! Sonnet-3.6 produces 10x more resources than GPT-4-Mini by the end of our play.
3.5K
akbir.
@akbirkhan
Mar 10, 2025
Replying to @akbirkhan
We provide a programatic interface where agents are able to engage with the game via code. This lets us evaluate how good code-based agents are at planning, interacting with environments and building complex systems
3.6K
akbir.
@akbirkhan
Mar 10, 2025
Replying to @akbirkhan
Factorio is a resource management and automation game where the goal is to build the largest factory Agents are dropped onto a fresh world and begin collecting resources, investing in technology and building factories to create more complex resources.
3.8K
akbir.
@akbirkhan
Jun 12, 2024
chatgpt.com/share/b20705ba… ^willing to take bets we get human level performance by NeurIPS
39K
akbir.
@akbirkhan
Mar 10, 2025
Replying to @akbirkhan
The awesome thing about factorio is that there is no upperbound on how many resources can you produce, and the technology tree is infinite [1]. This means the eval should not saturate, (we'd expect reward hacks before then) [1]
Technologies
From wiki.factorio.com
3.4K
akbir.
@akbirkhan
May 20, 2022
my feed is probs 95% ppl stressing about agi and then @hardmaru posting totoro fanart