docs: add automated contributions policy to CONTRIBUTING.md#3831
docs: add automated contributions policy to CONTRIBUTING.md#3831
Conversation
Added a section on automated contributions policy to clarify the use of AI tools in contributions. Updated maintainers list.
ianna
left a comment
There was a problem hiding this comment.
As @pfackeldey suggested - the text is taken from here.
I would also suggest to use the following labels to mark the PRs:

@TaiSakuma - please, suggest other two categories as you've mentioned in your document. Thanks
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3831 |
jpivarski
left a comment
There was a problem hiding this comment.
This is a reasonable statement on AI because of the prohibition against "fully-automated tools" and the need to be able to "explain changes upon request."
At the UChicago DSI, we're using AI for increasingly large parts of code generation and review because it does the routine parts well and just needs to be directed at a high level. The biggest challenge is communicating what "directed at a high level" means to those who don't have a strong programming background, such as students. To avoid banalities like "make sure you're in control," which someone is only likely to understand if they're doing it anyway, I've proposed some concrete measures, like making sure that at least half of all AI interactions are information-seeking (e.g. "what does this function do?"), rather than purely generating (e.g. "write a function that does..."), and requiring them to do clean-up phases (e.g. "simplify this module as much as possible and improve the names") after everything works, because the worst code is generated by asking AI to keep messing with the same code block until it runs. And of course, all linters are in full effect as guardrails: Claude or Codex can be told to keep running them until all the issues are fixed, and that almost always leads to improvement.
Also for projects without students, we use it quite extensively. What that means for our own pull requests is that we must tell reviewers what has already been checked by AI and what remains for human judgement (e.g. "is this the right approach?"). I never leave functions without type hints or docstrings anymore because there's essentially zero cost to letting AI fill in that information (with mypy as a correctness linter). So it's important for PR reviewers to not spend their time checking these things and I, as a PR author, need to tell them that. Reviews have become a lot more intentional, requiring a statement of scope from the PR author, because intention is the main thing AI code generators/checkers lack.
|
I wonder if adding "unsolicited" would help. In my mind, this is more like a code of conduct, which is really only used when there's an action that needs to be taken and the CoC explains why that action is justified. For example, if I wanted to assign "at" copilot to an issue, and it generated a PR to solve it, that would be in violation of this policy. Since I'm a maintainer, and assuming someone noticed that I triggered it, I assume that this policy would not be enforced, but adding "unsolicited" would make it clear that this is not a violation at all. Also, pre-commit and depenpendabot open computer generated PRs, I wouldn't want those to be in volition of a policy. I think the idea here is to just protect against an AI generated PR that takes time to review that we didn't ask for. |
@ianna - which two categories are you referring to? |
|
I have a question about the labels |
I would propose the pr contributor does it. |
I was wondering if those labels (or a different set) would work for the pr triage, e.g. the five outcomes of it. |
I guess in this case we would need a label for no ai as well… |
ikrommyd
left a comment
There was a problem hiding this comment.
I support these guidelines.
I would also probably add something along the lines of what jax says here: https://docs.jax.dev/en/latest/contributing.html#can-i-contribute-ai-generated-code
Basically that all contributions (AI included) should follow the scikit-hep code of conduct that we have and that we can be more picky about AI PRs in review if we know there is very little human interaction involved. I also really like the loose rule of thumb "If the team needs to spend more time reviewing a contribution than the contributor spends generating it, then the contribution is probably not helpful to the project"
Finally regarding the labels, I think on one hand having a label to distinguish between those PR types is good. On the other hand, I don't think the contributor can add labels can they? If they can, that's good but it also gives the ability to lie about it. If only we can change the labels, that can create some friction if they disagree with how we label a PR.
I generally would like a way to easily classify AI PRs though just like the numpy team wanted in case they need to be reverted at some point. I'm just a bit skeptical about how labeling will work. I'm fine with whatever you decide though.
Thanks. All good points! I suggest we go with this as-is and update this as soon as the Scikit-HEP agrees on a final policy. |
|
Hi @ianna, thanks for putting this up, I'm generally happy with this guideline, and also thanks for updating the maintainer list. I think @henryiii has a good point and I think we should add the "unsolicited" word. I can imagine that we do want to ask AI in some cases explicitly for e.g. review help/summary in addition to our 'human' review. We could also add a PR template where people are confronted with e.g. disclosing LLM usage if used, not sure what you think about this? The labeling is interesting from a metric point of view: we can in 1-2 years from now see how PR contributions have shifted to potentially more AI driven contributions. Labels allow us to filter PRs and use Github's REST API to create some interesting plots in the future, there may be a chance to learn something about how software engineering with AI shifts for our field. |
I like that too |
Good points! BTW, since we are in EU we need to add something like "This documentation was refined with AI assistance" to prepare for the 2026 regulations. I haven't found the US regulations yet - @henryiii perhaps you do? |
I intended these five outcomes to be for the AI triage to categorize PRs into, so as to save human reviewers time. If people spend enough time on a PR to put it into a category, they've just reviewed the PR. Their time isn't saved. I can experiment with the AI triage on a different branch or a fork. If that works, we can roll out to the main branch of the main repo. |
Excellent idea! Thanks @TaiSakuma |
|
For an example of a maintainer requested AI PR, scikit-hep/boost-histogram#1076 (requested in scikit-hep/boost-histogram#1074) |
|
By the way, I'm also not fond of this:
That includes a baked-in assumption that AI tools are not smart enough to work on awkward, which is almost certainly wrong in the future; I suspect it's wrong today. I think we should avoid statements that either become false soon, or might even be false today. I expect this is a statement from 1-2 years ago in scikit-learn, when it might have been true. |
TaiSakuma
left a comment
There was a problem hiding this comment.
This text was taken from another project, as stated in #3831 (review), which appears to have been written in 2024 or earlier. I think this is outdated in January 2026. If we are to place this for now, I think we should revise it soon, within a few months.
The text states about closing PRs in this section "Automated contributions policy". That isn't specific to this section. I think you can write about closing PRs more generally in a different section. I think that, in principle, you can close or ignore any PRs without providing a reason. Maybe you can state that you would normally review every PR, as long as resources permit, but it may not always be possible, and you may have to stop reviewing and close a PR when sufficient resources aren't available.
Instead of or in addition to about closing PRs, you can state when you merge. For example, you can state that you will only merge PRs that you have tested, understand, believe would be valuable to the project, and can maintain in the future.
Co-authored-by: Henry Schreiner <henry.fredrick.schreiner@cern.ch>
ikrommyd
left a comment
There was a problem hiding this comment.
Apart from my one OCD suggestion, this looks good!
Co-authored-by: Iason Krommydas <iason.krom@gmail.com>
pfackeldey
left a comment
There was a problem hiding this comment.
I like it a lot now 👍
jpivarski
left a comment
There was a problem hiding this comment.
It's good! I was a little confused by the parenthetical phrase. If I'm wrong in my interpretation, don't take my suggestion as it is.
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
lgray
left a comment
There was a problem hiding this comment.
After revision the new text is reasonable and acceptable.
|
LGTM! |
…#4051) * Add an AI-assisted contributions policy taken mostly from Awkward Array's (https://github.com/scikit-hep/awkward/), which was based on Scikit-learn's Automated Contributions Policy. * Add AI-assistance disclosure checkboxes to pull request template. * c.f. - scikit-hep/awkward#3831 - scikit-learn/scikit-learn#32566 Note that the Awkward Array language is more pro-AI-usage while the Scikit-learn language is more neutral. ### Context This was discussed in the AI section of the [2026 Snakemake Hackathon](https://indico.cern.ch/event/1574891/) at TUM ([GitHub project board](https://github.com/orgs/snakemake/projects/8)). <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [N/A] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added an "AI-assisted contributions" subsection to the contribution guide covering attribution, limits on automation, reviewer expectations, and when to disclose significant AI assistance. * Updated the pull request template to include an AI-assistance disclosure section and checklist to ensure contributors declare use of AI tools during submission and review. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Added a section on automated contributions policy to clarify the use of AI tools in contributions. Updated maintainers list.