Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6?

**tl;dr**: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of `Manifest.toml` files?

Continuing the discussion in https://github.com/JuliaLang/julia/pull/37595#discussion_r488976715, I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.

(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)

cc @fredrikekre @KristofferC

### What is the motivation of compile-time preference?

Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.

I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU https://github.com/JuliaLang/Pkg.jl/pull/977. IIUC @timholy's [ComputationalResources.jl](https://github.com/timholy/ComputationalResources.jl) achieves a similar effect with run-time `@eval`. FFTW's [deps/build.jl](https://github.com/JuliaMath/FFTW.jl/blob/v1.2.4/deps/build.jl) uses a text file `~/.julia/prefs/FFTW` to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constant [`NANSAFE_MODE_ENABLED`](http://www.juliadiff.org/ForwardDiff.jl/v0.10.8/user/advanced/#Fixing-NaN/Inf-Issues-1) for adding debugging instructions.

I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (https://github.com/JuliaLang/Pkg.jl/issues/458 and https://github.com/JuliaLang/Juleps/issues/38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in `Project.toml` that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.

Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences _and_ (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.

### Strategies

There are various ways to lookup preferences of stacked environments (i.e., `Base.load_path()`). To start the conversation, I discuss following threee strategies:

#### Strategy 1: First package hit in `Manifest.toml` files (current implementation as of #37595)

The current strategy for finding the preference for a package is to walk through `load_path()` one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.

#### Strategy 2: First preference hit in `Project.toml` files

Search `Project.toml` files in `load_path()` and find the first `Project.toml` file with the preference of the target package.

#### Strategy 3: First package hit in `Project.toml` files

Search `Project.toml` files in `load_path()` and find the first `Project.toml` file with the target package.

### Example

To illustrate the difference between these strategies, consider the following environment stack (i.e., `Base.load_path() == [X, Y, Z]`)

* Project `X`: `Project.toml` has package `A` which has package `B` as a dependency (i.e., `B` is in `Manifest.toml` but not in `Project.toml`). `Package.toml` has no compile-preferences table.
* Project `Y`: `Project.toml` has the compile-preferences table for `B`. However, `Project.toml`'s `deps` table does not contain `B`.
* Project `Z`: `Project.toml` has the compile-preferences table for `B`. `Project.toml` includes `B` in `deps`; i.e., the user ran `pkg> add B` while activating `Z`.

Strategy 1 finds the preferences for `B` in `X` (i.e., empty). Strategy 2 finds the preferences for `B` in `Y`. Strategy 3 finds the preferences for `B` in `Z`.

To summarize:

| Project | `deps` | `compile-preferences` | `Manifest.toml` | found by |
| --- | --- | --- | --- | --- |
| X | `[A, ...]` | empty | has `B` as an indirect dependency | Strategy 1 |
| Y | `[...]` | has `B`'s preferences | has `B` as an indirect dependency | Strategy 2 |
| Z | `[B]` | has `B`'s preferences | has `B` | Strategy 3 |


### Analysis

As I discussed in https://github.com/JuliaLang/julia/pull/37595#discussion_r490708451, I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package `A` depends on `B` is (usually) an implementation detail.  Package `A`'s author may silently drop `B` from the dependency when bumping v1.1 to v1.2. Then, after `Pkg.update`, Strategy 1 would pick up project `Y` as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state of `Manifest.toml` since it is a large opaque file to the users and it is often not checked in to the version control system.

Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the `compat` entry. For example, the package can add the `compat` bound for the given preference support.  The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package in `Project.toml` that they added just for configuring a transitive dependency.

### Alternative: shallow-merge all preference tables?

It's also conceivable to aggressively combine preference tables for a given package using `merge(dicts...)`.  That is to say, given

```toml
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 1
b = 2
```

and

```toml
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 10
c = 30
```

we'd have `merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2))` (i.e., `Dict("a" => 1, "b" => 2, "c" => 30)`).

Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:

```toml
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences] # note `.preferences` suffix
a = 1
b = 2
```

and

```toml
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences]
a = 10
c = 30
```

As long as the specification is clearly documented, the package authors can use the appropriate behavior.

### Opinion

I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.

### Appendix: Current implementation

The entry point for the precompilation cache manager is `get_preferences_hash`

https://github.com/JuliaLang/julia/blob/6596f95226e59bc0bc34bda3ffd389f08fca7e24/base/loading.jl#L325-L348

https://github.com/JuliaLang/julia/blob/6596f95226e59bc0bc34bda3ffd389f08fca7e24/base/loading.jl#L1458-L1484


Project	`deps`	`compile-preferences`	`Manifest.toml`	found by
X	`[A, ...]`	empty	has `B` as an indirect dependency	Strategy 1
Y	`[...]`	has `B`'s preferences	has `B` as an indirect dependency	Strategy 2
Z	`[B]`	has `B`'s preferences	has `B`	Strategy 3

	function uuid_in_environment(project_file::String, uuid::UUID, cache::TOMLCache)
	# First, check to see if we're looking for the environment itself
	proj_uuid = get(parsed_toml(cache, project_file), "uuid", nothing)
	if proj_uuid !== nothing && UUID(proj_uuid) == uuid
	return true
	end

	# Check to see if there's a Manifest.toml associated with this project
	manifest_file = project_file_manifest_path(project_file, cache)
	if manifest_file === nothing
	return false
	end
	manifest = parsed_toml(cache, manifest_file)
	for (dep_name, entries) in manifest
	for entry in entries
	entry_uuid = get(entry, "uuid", nothing)::Union{String, Nothing}
	if uuid !== nothing && UUID(entry_uuid) == uuid
	return true
	end
	end
	end
	# If all else fails, return `false`
	return false
	end

	# Find the Project.toml that we should load/store to for Preferences
	function get_preferences_project_path(uuid::UUID, cache::TOMLCache = TOMLCache())
	for env in load_path()
	project_file = env_project_file(env)
	if !isa(project_file, String)
	continue
	end
	if uuid_in_environment(project_file, uuid, cache)
	return project_file
	end
	end
	return nothing
	end

	function get_preferences(uuid::UUID, cache::TOMLCache = TOMLCache();
	prefs_key::String = "compile-preferences")
	project_path = get_preferences_project_path(uuid, cache)
	if project_path !== nothing
	preferences = get(parsed_toml(cache, project_path), prefs_key, Dict{String,Any}())
	if haskey(preferences, string(uuid))
	return preferences[string(uuid)]
	end
	end
	# Fall back to default value of "no preferences".
	return Dict{String,Any}()
	end
	get_preferences_hash(uuid::UUID, cache::TOMLCache = TOMLCache()) = UInt64(hash(get_preferences(uuid, cache)))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6? #37791

What is the motivation of compile-time preference?

Strategies

Strategy 1: First package hit in `Manifest.toml` files (current implementation as of #37595)

Strategy 2: First preference hit in `Project.toml` files

Strategy 3: First package hit in `Project.toml` files

Example

Analysis

Alternative: shallow-merge all preference tables?

Opinion

Appendix: Current implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6? #37791

Description

What is the motivation of compile-time preference?

Strategies

Strategy 1: First package hit in Manifest.toml files (current implementation as of #37595)

Strategy 2: First preference hit in Project.toml files

Strategy 3: First package hit in Project.toml files

Example

Analysis

Alternative: shallow-merge all preference tables?

Opinion

Appendix: Current implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Strategy 1: First package hit in `Manifest.toml` files (current implementation as of #37595)

Strategy 2: First preference hit in `Project.toml` files

Strategy 3: First package hit in `Project.toml` files