Add support for suppressing SRCS env variables (in the case of rules with large numbers of sources) #3393

toastwaffle · 2025-07-28T14:05:39Z

Instead, we write files to the build environment containing the list of sources which would otherwise be in the environment variables.

chrisnovakovic

I understand the use case for this, but I feel that suppressing the list of inputs to a build target goes against one of the core tenets of Please, namely that build environments are explicitly told which files they must output given a list of files as inputs. The problem here is that the input list can exceed the maximum length of an environment variable value when stringified, but I'm not convinced that the solution is to eliminate that information entirely - I think it'd be much better to provide it via a mechanism that isn't subject to the constraints imposed on environment variables, such as in a file.

I'm also not keen on the way this impacts all named sources, even those that would be short enough to fit inside an environment variable value when stringified - consider how difficult it would be to express the cmd for a target with the following sources:

genrule(
   # ...
   srcs = {
        "excessive": glob([
            # matches a huge number of files whose length exceeds `ARG_MAX`
        ]),
        "ok": [
            # a smaller but substantial list of specific file names, not a glob
        ],
    },
    # ...
)

Where did we land on allowing targets to opt in to receiving source file lists in a file rather than an environment variable (suggested in #3390)? I think that's an all-round better approach.

ChangeLog

toastwaffle · 2025-07-31T13:44:30Z

I understand the use case for this, but I feel that suppressing the list of inputs to a build target goes against one of the core tenets of Please, namely that build environments are explicitly told which files they must output given a list of files as inputs. The problem here is that the input list can exceed the maximum length of an environment variable value when stringified, but I'm not convinced that the solution is to eliminate that information entirely - I think it'd be much better to provide it via a mechanism that isn't subject to the constraints imposed on environment variables, such as in a file.

I'm also not keen on the way this impacts all named sources, even those that would be short enough to fit inside an environment variable value when stringified - consider how difficult it would be to express the cmd for a target with the following sources:
genrule(
   # ...
   srcs = {
        "excessive": glob([
            # matches a huge number of files whose length exceeds `ARG_MAX`
        ]),
        "ok": [
            # a smaller but substantial list of specific file names, not a glob
        ],
    },
    # ...
)
Where did we land on allowing targets to opt in to receiving source file lists in a file rather than an environment variable (suggested in #3390)? I think that's an all-round better approach.

I think it's reasonable for a target to be able to declare explicitly that it does not need the environment variables that would be generated (hence the move away from making please suppressing the variables when there were just too many sources). Is your concern that having this option would open the gate to other build defs abusing it?

I think the problem we're trying to fix is an exceptional case for which I expect this escape hatch to be used very rarely, and I don't think it justifies the additional effort to add support for putting the build environment into files (and the potential sharp edges that could come with doing that), especially when we have no other use case for it and a working solution for Go.

chrisnovakovic · 2025-07-31T14:06:00Z

I think it's reasonable for a target to be able to declare explicitly that it does not need the environment variables that would be generated (hence the move away from making please suppressing the variables when there were just too many sources). Is your concern that having this option would open the gate to other build defs abusing it?

Not abusing it, per se (it's opt-in, after all), but I can see other situations where downstream users might run into the same (or a similar) problem but in a way that can't be worked around with this approach. The long/short named sources example above is one of them; another is one in which two excessively long lists of sources are presented to the build environment and the command is still somehow expected to process them separately. We'd then have to think carefully about how to solve that problem in a way that's backwards-compatible with what's being proposed here.

I think the problem we're trying to fix is an exceptional case for which I expect this escape hatch to be used very rarely, and I don't think it justifies the additional effort to add support for putting the build environment into files (and the potential sharp edges that could come with doing that), especially when we have no other use case for it and a working solution for Go.

I think this is actually the crux of my concern: it feels like a highly specific solution developed in response to a highly specific problem. I appreciate the rarity of the problem - after all, we haven't encountered any other reports of this in nearly a decade of public Please development - but I feel that any solution implemented within the main Please code base ought to be generic enough to solve similar problems that might crop up in future, however unlikely we think they might be right now.

toastwaffle · 2025-07-31T14:50:29Z

Okay, colour me convinced :) do you have any suggestions for where to put those files in the sandbox? I was thinking something like <sandbox_dir>/__plz_build_environment/SRCS (i.e. filenames matching what the env var would otherwise be called). And do you think it should be SRCS_GO_FILE, or SRCSFILE_GO, or SRCS_FILE_GO?

chrisnovakovic · 2025-08-01T10:43:29Z

Okay, colour me convinced :) do you have any suggestions for where to put those files in the sandbox? I was thinking something like <sandbox_dir>/__plz_build_environment/SRCS (i.e. filenames matching what the env var would otherwise be called). And do you think it should be SRCS_GO_FILE, or SRCSFILE_GO, or SRCS_FILE_GO?

IMO we should keep the directory structure as "Please-like" as possible - there should be files named src and srcs (analogous to $SRC and $SRCS respectively) as well as srcs/xyz (analogous to $SRCS_XYZ for named sources), all in a top-level directory in the sandbox. Unsure what it should be called, but it should be short enough that it isn't a pain to keep writing or reading it in the command while being generic enough that we can put other stuff in there later if we feel we need to. I just did a quick search of the plugins and couldn't find any with commands that created a top-level directory named _plz, so perhaps that?

If we make this explicitly an opt-in feature, I don't think we need the environment variables at all (although perhaps it'd be neater to refer to $SRCSFILE_XYZ than $TMP_DIR/_plz/srcs/xyz)...

cemeceme · 2025-08-01T12:47:29Z

Just my two cents, but isn't the issue that please is trying to keep using bash for its actual command runner?
It would be a lot cleaner if please could run commands directly, sidestepping environment variable lengths limits as well as things like having to deal with string escaping, spaces in the strings and other shell related issues. Incidentally, that seems to be one of the main issues when trying to run please, where the root path contains spaces.
This discussion thread also brings up a valid point.

Alternatively, it may also make sense to move the string building part to be handled by please in the first place. After all, please already supports commands in the strings and if I recall correctly, these are handled before the shell is spawned.

One downside I can think of with this suggestion is that you can no longer use --shell to interactively debug your rules, but even that could probably be handled in a user friendly way (or spawn a shell only for that specific case).

rules/builtins.build_defs

src/build/build_step.go

src/build/incrementality.go

src/build/incrementality_test.go

src/core/build_env.go

src/remote/action.go

test/source_list_files/test_repo/BUILD.test

VERSION

…with large numbers of sources)

ChangeLog

src/build/build_step.go

Co-authored-by: Chris Novakovic <chris@chrisn.me.uk>

toastwaffle mentioned this pull request Jul 28, 2025

Add support for large packages in go_repos please-build/go-rules#320

Merged

toastwaffle requested review from Tatskaari and chrisnovakovic July 28, 2025 15:57

chrisnovakovic reviewed Jul 31, 2025

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

VJftw mentioned this pull request Aug 29, 2025

feat: Make Build Entrypoint configurable #3400

Draft

2 tasks

toastwaffle force-pushed the large-packages branch from d5a7555 to 3834c62 Compare October 9, 2025 10:24

chrisnovakovic requested changes Oct 9, 2025

View reviewed changes

toastwaffle added 5 commits October 9, 2025 14:38

Add support for suppressing SRCS env variables (in the case of rules …

15ec6e6

…with large numbers of sources)

Update version

efcf7a0

Add new field to hash

0689575

Add source list files

0d39f85

Add e2e tests

df77c0f

toastwaffle force-pushed the large-packages branch from e1e0bbd to 8885276 Compare October 9, 2025 13:39

Rename NoSrcEnvVars to SrcListFiles

312c6eb

toastwaffle force-pushed the large-packages branch from 5d76e37 to 312c6eb Compare October 10, 2025 09:20

chrisnovakovic approved these changes Oct 10, 2025

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

src/build/build_step.go Show resolved Hide resolved

Apply suggestions from code review

704123b

Co-authored-by: Chris Novakovic <chris@chrisn.me.uk>

toastwaffle merged commit dd8a199 into thought-machine:master Oct 10, 2025
13 checks passed

toastwaffle deleted the large-packages branch October 10, 2025 12:59

Add support for suppressing SRCS env variables (in the case of rules with large numbers of sources) #3393

Add support for suppressing SRCS env variables (in the case of rules with large numbers of sources) #3393

Conversation

toastwaffle commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisnovakovic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toastwaffle commented Jul 31, 2025

Uh oh!

chrisnovakovic commented Jul 31, 2025

Uh oh!

toastwaffle commented Jul 31, 2025

Uh oh!

chrisnovakovic commented Aug 1, 2025

Uh oh!

cemeceme commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

toastwaffle commented Jul 28, 2025 •

edited

Loading