Contributing: Require a human in the loop for LLM contributions#18315
Contributing: Require a human in the loop for LLM contributions#18315
Conversation
This policy is derived from https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md: We have the problem that there are autonomous contributions, or users trying to get the LLM to solve the hard problems, where it usually fails. I've kept the section intentionally concise without laying out all the branches of valid and invalid LLM usage. For example, it's totally possible to first have the LLM write the change, review that, then develop a design from that and iterate on it and in the end have a fully human-understood PR. What I want to discourage is that contributors think they can outsource e.g. the hard algorithmic parts to the LLM, without understanding the existing structure, where the LLM currently inevitably fails.
7dcb784 to
926a235
Compare
| When using LLMs, there must always be a human in the loop that fully understands the code. You need | ||
| to be able to explain all changes and how they interact with the rest of the codebase without LLM, | ||
| as LLMs cannot (yet) correctly reason about complex codebases. Generally, this requires doing the | ||
| design of how the change yourself and understanding all involved datastructures, formats and | ||
| algorithms, while only use the LLM for directed coding. Autonomous contributions from AI agents are | ||
| not allowed. |
There was a problem hiding this comment.
Some small grammatical corrections and light copy editing.
| When using LLMs, there must always be a human in the loop that fully understands the code. You need | |
| to be able to explain all changes and how they interact with the rest of the codebase without LLM, | |
| as LLMs cannot (yet) correctly reason about complex codebases. Generally, this requires doing the | |
| design of how the change yourself and understanding all involved datastructures, formats and | |
| algorithms, while only use the LLM for directed coding. Autonomous contributions from AI agents are | |
| not allowed. | |
| When using LLMs, there must always be a human in the loop who fully understands the code. You need | |
| to be able to explain all changes and how they interact with the rest of the codebase without LLM | |
| assistance, as LLMs cannot (yet) correctly reason about complex codebases. Generally, this requires | |
| designing the change yourself and understanding all involved data structures, formats, and | |
| algorithms, while only using the LLM for directed coding. Autonomous contributions from AI agents | |
| are not allowed. |
There was a problem hiding this comment.
I think expecting contributors to fully understand the code is a bit of a stretch even when LLMs are not involved.
Moreover, I think it's fine to use an LLM to assist with understanding. The main problem arises when people substitute the LLM entirely in place of a good faith attempt at understanding.
At the end of the day, the goal of this paragraph is to point at it when dismissing a contributor who is over-reliant on the use of an LLM and therefore is causing an excessive review load.
So maybe something like:
When using LLMs, there must always be a human in the loop who is making a conscious, good-faith
effort to understand the code and the changes they are making to it. LLMs cannot (yet) correctly
reason about complex codebases. They can be used as a tool to support your own understanding and aid
in implementation, but not as a substitute for it. Autonomous contributions from AI agents are not
allowed.
There was a problem hiding this comment.
My idea is that when you make a PR, for each line in the change, you should be able to explain what it does when prompted. This is high bar, but usually required for uv: If there's a change that just seems to work, but we can't reason about, it becomes a point of failure. Before I create a PR, a go through the diff and check if all changes look right. There's of course a lot of non-local knowledge that contributors don't have, that we have to add in the PR review. But in my experience contributors just putting up a diff they don't understand just means we have to design the whole change. There goes a lot of work and iterations into validating each change, and a contribution should be focussed on that part of the work rather than focussing on the code itself (which may be noisier in review but is much easier to change). This may be a problem particular to the stage at which uv currently is.
LLM are great at helping to understand where things are defined, how the logic flows and finding problems. But they also tell you a lot of made up answer if you leave them to reason on their own, you have to ask where stuff is defined and check yourself (which is now much easier because it can tell you the exact locations to check).
There was a problem hiding this comment.
Yeah I agree, but the way it's worded now is stricter than I think you mean to convey, at least the way I read it.
I've submitted a new suggestion which I hope is closer to something we'd both agree with?
Co-authored-by: Tomasz Kramkowski <tom@astral.sh>
Co-authored-by: Tomasz Kramkowski <tom@astral.sh>
This policy is derived from https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md: We have the problem that there are autonomous contributions, or users trying to get the LLM to solve the hard problems, where it usually fails.
I've kept the section intentionally concise without laying out all the branches of valid and invalid LLM usage. For example, it's totally possible to first have the LLM write the change, review that, then develop a design from that and iterate on it and in the end have a fully human-understood PR. What I want to discourage is that contributors think they can outsource e.g. the hard algorithmic parts to the LLM, without understanding the existing structure, where the LLM currently inevitably fails.