Japan is a bank

bhauth17 Jan 2026 16:33 UTC

9 points

0 comments1 min readLW link

(www.bhauth.com)

Forfeiting Ill-Gotten Gains

jefftk17 Jan 2026 0:20 UTC

25 points

1 comment1 min readLW link

(www.jefftk.com)

Is It Reasoning or Just a Fixed Bias?

Sriram Kiron16 Jan 2026 21:43 UTC

12 points

0 comments1 min readLW link

(ramaway.com)

Future-as-Label: Scalable Supervision from Real-World Outcomes

Ben Turtel16 Jan 2026 21:21 UTC

−2 points

0 comments1 min readLW link

Comparing yourself to other people

dominicq16 Jan 2026 20:31 UTC

10 points

2 comments2 min readLW link

(sundaystopwatch.eu)

Precedents for the Unprecedented: Historical Analogies for Thirteen Artificial Superintelligence Risks

James_Miller16 Jan 2026 18:43 UTC

79 points

3 comments63 min readLW link

Why falling labor share ≠ falling employment

Lydia Nottingham16 Jan 2026 17:27 UTC

−5 points

2 comments2 min readLW link

(lydianottingham.substack.com)

Digital Minds: A Quickstart Guide

Avi Parrack and Štěpán Los

16 Jan 2026 17:15 UTC

10 points

0 comments22 min readLW link

(aviparrack.substack.com)

The culture and design of human-AI interactions

zef16 Jan 2026 17:11 UTC

2 points

0 comments4 min readLW link

(bloodsteel.substack.com)

Scaling Laws for Economic Impacts: Experimental Evidence from 500 Professionals and 13 LLMs

Ali Merali16 Jan 2026 13:40 UTC

12 points

2 comments4 min readLW link

[Pre-print] Building safe AGI as an ergonomics problem

ricardotkcl and paris.lalousis@kcl.ac.uk

16 Jan 2026 13:18 UTC

1 point

0 comments1 min readLW link

(doi.org)

Powerful misaligned AIs may be extremely persuasive, especially absent mitigations

Cody Rushing16 Jan 2026 8:08 UTC

52 points

2 comments14 min readLW link

Should control down-weight negative net-sabotage-value threats?

Fabien Roger16 Jan 2026 4:18 UTC

33 points

0 comments10 min readLW link

Total utilitarianism is fine

Abhimanyu Pallavi Sudhir16 Jan 2026 0:32 UTC

3 points

2 comments3 min readLW link

Test your interpretability techniques by de-censoring Chinese models

Khoi Tran, aryaj, Senthooran Rajamanoharan and Neel Nanda

15 Jan 2026 16:33 UTC

66 points

5 comments20 min readLW link

Reflections on TA-ing Harvard’s first AI safety course

Roy Rinberg15 Jan 2026 16:28 UTC

74 points

4 comments9 min readLW link

I Made a Judgment Calibration Game for Beginners (Calibrate)

Luise Woehlke15 Jan 2026 15:04 UTC

13 points

2 comments1 min readLW link

Corrigibility Scales To Value Alignment

PeterMcCluskey15 Jan 2026 0:05 UTC

11 points

5 comments5 min readLW link

(bayesianinvestor.com)

Deeper Reviews for the top 15 (of the 2024 Review)

Raemon14 Jan 2026 23:59 UTC

44 points

2 comments5 min readLW link

If we get primary cruxes right, secondary cruxes will be solved automatically

Jordan Arel14 Jan 2026 22:44 UTC

1 point

1 comment4 min readLW link