Skip to content

Automatically inject dependencies on sibling files when dependency inference is unused#10582

Merged
Eric-Arellano merged 4 commits intopantsbuild:masterfrom
Eric-Arellano:sibling-deps
Aug 11, 2020
Merged

Automatically inject dependencies on sibling files when dependency inference is unused#10582
Eric-Arellano merged 4 commits intopantsbuild:masterfrom
Eric-Arellano:sibling-deps

Conversation

@Eric-Arellano
Copy link
Copy Markdown
Contributor

@Eric-Arellano Eric-Arellano commented Aug 10, 2020

Problem

As of #10511, we now use file addresses / generated subtargets everywhere, except for the graph introspection rules.

By default, generated subtargets do not depend on the sibling files in their base target, i.e. the other files in the original sources field. Those dependencies will only be included if dependency inference added it, or if the user explicitly added something like ./sibling.txt to the base target's dependencies field.

We generally do not want to automatically depend on sibling files. This would mean that file addresses (and file dependencies) do not actually have any benefit, because ./pants dependencies $file would resolve the same results as ./pants dependencies $sibling, i.e. we would not get finer-grained caching. Further, the conceptual model of "targets as metadata applied to files"—rather than "a collection of files with metadata"—would imply that we should not depend on sibling files automatically.

However, for users who are not using dep inference and/or explicit file deps, many things will break if we don't automatically depend on sibling files. In Pants 1.x, it was a safe assumption that you could access any file in the target's sources field without having to add a dependency. It's majorly disruptive to break this assumption, and in effect, they would need to use dependency inference to be ergonomic.

Solution

Unless --python-infer-imports is True, and the target in question has PythonSources, then we will automatically add dependencies on all sibling files for file addresses.

This means that every non-Python target (e.g. files, bash_library) will depend on its siblings, regardless of using dependency inference.

[ci skip-rust]
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Copy Markdown
Contributor Author

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As argued in the PR description, I'm pretty sure that a global option is the correct approach, rather than trying to do something based on what inference options are set.

There is an awkward edge that any languages provided via plugins won't work very well if they don't have dependency inference implemented. For example, for the Bash plugin, we would probably want to set --files-depend-on-target-siblings. But for Python, we would not want this option. Instead, we'd want a per-language option.

prelude_glob_patterns: Tuple[str, ...]
build_patterns: Tuple[str, ...]
build_ignore_patterns: Tuple[str, ...]
subproject_roots: Tuple[str, ...]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option was being accessed by looking at GlobalOptions, rather than AddressMapper. That works well for us because we can now move that option out of the bootstrap options, and it makes it easier for us to delete AddressMapper.

Copy link
Copy Markdown
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Change looks good, but:

There is an awkward edge that any languages provided via plugins won't work very well if they don't have dependency inference implemented. For example, for the Bash plugin, we would probably want to set --files-depend-on-target-siblings. But for Python, we would not want this option. Instead, we'd want a per-language option.

This would seem to argue for having a per-backend option, or for using the inference option? Depending on how you looked up the per-language option to use (...based on inference @unions for the language, perhaps?), the implication would be that not having inference (enabled) for a language should trigger this behavior?

We no longer have a reliable mechanism to determine if dependency inference is used in internals/graph.py, now that the global option --dependency-inference is removed.

It doesn't sound like it should be global, given the above.

@Eric-Arellano
Copy link
Copy Markdown
Contributor Author

up the per-language option to use (...based on inference @Unions for the language, perhaps?)

It's unfortunately not that simple. If you have the Python backend enabled, but all the [python-infer] options as false, graph.py will still thing that inference is being used because all 3 of those inference rules will be registered. Their implementation will no-op, but graph.py doesn't see the internals of the rules.

The only solution I can think of is to have a rule that will look up the target type and return a boolean for if this option should be used or not? We would generate subtargets for files() and resources(), for example, but not python_library().

I talked myself out of that though because it's really complex. I think this is fine for now and we can make something more clever when we add more languages + get user feedback. Thoughts?

@stuhood
Copy link
Copy Markdown
Member

stuhood commented Aug 10, 2020

It's unfortunately not that simple. If you have the Python backend enabled, but all the [python-infer] options as false, graph.py will still thing that inference is being used because all 3 of those inference rules will be registered. Their implementation will no-op, but graph.py doesn't see the internals of the rules.

There are probably lots of changes that we could make to those unions, but one option would be for the InferredDependencies return type to gain a boolean that indicated whether target file-intra-dependencies should be generated... and then, if not inference_rules or any(id.should_add_intra for id in inferred_dependencies).

The only solution I can think of is to have a rule that will look up the target type and return a boolean for if this option should be used or not? We would generate subtargets for files() and resources(), for example, but not python_library().

Yea, that'd work too.

@Eric-Arellano
Copy link
Copy Markdown
Contributor Author

but one option would be for the InferredDependencies return type to gain a boolean

That might work. Would we want this logic if any of the 3 inference rules are used, or only import inference?

@stuhood
Copy link
Copy Markdown
Member

stuhood commented Aug 10, 2020

but one option would be for the InferredDependencies return type to gain a boolean

That might work. Would we want this logic if any of the 3 inference rules are used, or only import inference?

Only imports.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Copy Markdown
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for iterating. Description/title need updates.

literal_ignored_addresses = set(
await MultiGet(Get(Address, AddressInput, ai) for ai in provided.ignored_addresses)
# If this is a base target, or no dependency inference implementation can infer dependencies on
# a file address's sibling files, then we inject dependencies on all the base target's
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

…dencies_inferrable`

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano changed the title Add --files-depend-on-target-siblings option Automatically inject dependencies on sibling files when dependency inference is unused Aug 10, 2020
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same at 0.0% when pulling ad1d271 on Eric-Arellano:sibling-deps into 57a4745 on pantsbuild:master.

@Eric-Arellano Eric-Arellano merged commit 3c03321 into pantsbuild:master Aug 11, 2020
@Eric-Arellano Eric-Arellano deleted the sibling-deps branch August 11, 2020 00:31
@Eric-Arellano
Copy link
Copy Markdown
Contributor Author

A year later, this was a good start, but too course grained. The impact of this change is that using a file-level dependency for a resources() target will not actually be any different than depending on all the resources(). Even though there's no need for the individual resource files to depend on each other.

Instead, we should probably have targets express what semantics make sense for them somehow.

@Eric-Arellano
Copy link
Copy Markdown
Contributor Author

Oh!! The target generation rules should do this. Using the design from https://docs.google.com/document/d/1HpJn2jTWf5sKob6zhe4SqHZ7KWBlX4a6OJOwnuw-IHo/edit#, the rule to generate file from files targets would do this for example. Get rid of magical handling in engine/internals/graph.py.

Will do in a followup.

Eric-Arellano added a commit that referenced this pull request Sep 17, 2021
…r generated file-level targets (#12906)

Due to #10582, we automatically were adding dependencies on all sibling files from a target generator (aka "build target") if that target did not have dependency inference, or its inference did not support inferring deps on siblings. This is important because users expect with languages like Python that every file in `python_library()` would automatically depend on every other file if it weren't for dependency inference.

But our heuristic was not fully accurate. For example, `files()` and `resources()` should never infer deps on siblings because there is no notion of a file depending on another file. If you need all the files, depend on the original `files()` target: only use file targets if you just need that single file. (That will be fixed in a followup.)

[ci skip-build-wheels]
[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants