Skip to content

Conversation

@toastwaffle
Copy link
Contributor

@toastwaffle toastwaffle commented Jul 28, 2025

Instead, we write files to the build environment containing the list of sources which would otherwise be in the environment variables.

Copy link
Contributor

@chrisnovakovic chrisnovakovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the use case for this, but I feel that suppressing the list of inputs to a build target goes against one of the core tenets of Please, namely that build environments are explicitly told which files they must output given a list of files as inputs. The problem here is that the input list can exceed the maximum length of an environment variable value when stringified, but I'm not convinced that the solution is to eliminate that information entirely - I think it'd be much better to provide it via a mechanism that isn't subject to the constraints imposed on environment variables, such as in a file.

I'm also not keen on the way this impacts all named sources, even those that would be short enough to fit inside an environment variable value when stringified - consider how difficult it would be to express the cmd for a target with the following sources:

genrule(
   # ...
   srcs = {
        "excessive": glob([
            # matches a huge number of files whose length exceeds `ARG_MAX`
        ]),
        "ok": [
            # a smaller but substantial list of specific file names, not a glob
        ],
    },
    # ...
)

Where did we land on allowing targets to opt in to receiving source file lists in a file rather than an environment variable (suggested in #3390)? I think that's an all-round better approach.

@toastwaffle
Copy link
Contributor Author

I understand the use case for this, but I feel that suppressing the list of inputs to a build target goes against one of the core tenets of Please, namely that build environments are explicitly told which files they must output given a list of files as inputs. The problem here is that the input list can exceed the maximum length of an environment variable value when stringified, but I'm not convinced that the solution is to eliminate that information entirely - I think it'd be much better to provide it via a mechanism that isn't subject to the constraints imposed on environment variables, such as in a file.

I'm also not keen on the way this impacts all named sources, even those that would be short enough to fit inside an environment variable value when stringified - consider how difficult it would be to express the cmd for a target with the following sources:

genrule(
   # ...
   srcs = {
        "excessive": glob([
            # matches a huge number of files whose length exceeds `ARG_MAX`
        ]),
        "ok": [
            # a smaller but substantial list of specific file names, not a glob
        ],
    },
    # ...
)

Where did we land on allowing targets to opt in to receiving source file lists in a file rather than an environment variable (suggested in #3390)? I think that's an all-round better approach.

I think it's reasonable for a target to be able to declare explicitly that it does not need the environment variables that would be generated (hence the move away from making please suppressing the variables when there were just too many sources). Is your concern that having this option would open the gate to other build defs abusing it?

I think the problem we're trying to fix is an exceptional case for which I expect this escape hatch to be used very rarely, and I don't think it justifies the additional effort to add support for putting the build environment into files (and the potential sharp edges that could come with doing that), especially when we have no other use case for it and a working solution for Go.

@chrisnovakovic
Copy link
Contributor

I think it's reasonable for a target to be able to declare explicitly that it does not need the environment variables that would be generated (hence the move away from making please suppressing the variables when there were just too many sources). Is your concern that having this option would open the gate to other build defs abusing it?

Not abusing it, per se (it's opt-in, after all), but I can see other situations where downstream users might run into the same (or a similar) problem but in a way that can't be worked around with this approach. The long/short named sources example above is one of them; another is one in which two excessively long lists of sources are presented to the build environment and the command is still somehow expected to process them separately. We'd then have to think carefully about how to solve that problem in a way that's backwards-compatible with what's being proposed here.

I think the problem we're trying to fix is an exceptional case for which I expect this escape hatch to be used very rarely, and I don't think it justifies the additional effort to add support for putting the build environment into files (and the potential sharp edges that could come with doing that), especially when we have no other use case for it and a working solution for Go.

I think this is actually the crux of my concern: it feels like a highly specific solution developed in response to a highly specific problem. I appreciate the rarity of the problem - after all, we haven't encountered any other reports of this in nearly a decade of public Please development - but I feel that any solution implemented within the main Please code base ought to be generic enough to solve similar problems that might crop up in future, however unlikely we think they might be right now.

@toastwaffle
Copy link
Contributor Author

Okay, colour me convinced :) do you have any suggestions for where to put those files in the sandbox? I was thinking something like <sandbox_dir>/__plz_build_environment/SRCS (i.e. filenames matching what the env var would otherwise be called). And do you think it should be SRCS_GO_FILE, or SRCSFILE_GO, or SRCS_FILE_GO?

@chrisnovakovic
Copy link
Contributor

Okay, colour me convinced :) do you have any suggestions for where to put those files in the sandbox? I was thinking something like <sandbox_dir>/__plz_build_environment/SRCS (i.e. filenames matching what the env var would otherwise be called). And do you think it should be SRCS_GO_FILE, or SRCSFILE_GO, or SRCS_FILE_GO?

IMO we should keep the directory structure as "Please-like" as possible - there should be files named src and srcs (analogous to $SRC and $SRCS respectively) as well as srcs/xyz (analogous to $SRCS_XYZ for named sources), all in a top-level directory in the sandbox. Unsure what it should be called, but it should be short enough that it isn't a pain to keep writing or reading it in the command while being generic enough that we can put other stuff in there later if we feel we need to. I just did a quick search of the plugins and couldn't find any with commands that created a top-level directory named _plz, so perhaps that?

If we make this explicitly an opt-in feature, I don't think we need the environment variables at all (although perhaps it'd be neater to refer to $SRCSFILE_XYZ than $TMP_DIR/_plz/srcs/xyz)...

@cemeceme
Copy link
Contributor

cemeceme commented Aug 1, 2025

Just my two cents, but isn't the issue that please is trying to keep using bash for its actual command runner?
It would be a lot cleaner if please could run commands directly, sidestepping environment variable lengths limits as well as things like having to deal with string escaping, spaces in the strings and other shell related issues. Incidentally, that seems to be one of the main issues when trying to run please, where the root path contains spaces.
This discussion thread also brings up a valid point.

Alternatively, it may also make sense to move the string building part to be handled by please in the first place. After all, please already supports commands in the strings and if I recall correctly, these are handled before the shell is spawned.

One downside I can think of with this suggestion is that you can no longer use --shell to interactively debug your rules, but even that could probably be handled in a user friendly way (or spawn a shell only for that specific case).

Co-authored-by: Chris Novakovic <chris@chrisn.me.uk>
@toastwaffle toastwaffle merged commit dd8a199 into thought-machine:master Oct 10, 2025
13 checks passed
@toastwaffle toastwaffle deleted the large-packages branch October 10, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants