tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of Manifest.toml files?
Continuing the discussion in #37595 (comment), I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.
(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)
cc @fredrikekre @KristofferC
What is the motivation of compile-time preference?
Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.
I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU JuliaLang/Pkg.jl#977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time @eval. FFTW's deps/build.jl uses a text file ~/.julia/prefs/FFTW to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constant NANSAFE_MODE_ENABLED for adding debugging instructions.
I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (JuliaLang/Pkg.jl#458 and JuliaLang/Juleps#38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in Project.toml that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.
Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences and (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.
Strategies
There are various ways to lookup preferences of stacked environments (i.e., Base.load_path()). To start the conversation, I discuss following threee strategies:
Strategy 1: First package hit in Manifest.toml files (current implementation as of #37595)
The current strategy for finding the preference for a package is to walk through load_path() one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.
Strategy 2: First preference hit in Project.toml files
Search Project.toml files in load_path() and find the first Project.toml file with the preference of the target package.
Strategy 3: First package hit in Project.toml files
Search Project.toml files in load_path() and find the first Project.toml file with the target package.
Example
To illustrate the difference between these strategies, consider the following environment stack (i.e., Base.load_path() == [X, Y, Z])
- Project
X: Project.toml has package A which has package B as a dependency (i.e., B is in Manifest.toml but not in Project.toml). Package.toml has no compile-preferences table.
- Project
Y: Project.toml has the compile-preferences table for B. However, Project.toml's deps table does not contain B.
- Project
Z: Project.toml has the compile-preferences table for B. Project.toml includes B in deps; i.e., the user ran pkg> add B while activating Z.
Strategy 1 finds the preferences for B in X (i.e., empty). Strategy 2 finds the preferences for B in Y. Strategy 3 finds the preferences for B in Z.
To summarize:
| Project |
deps |
compile-preferences |
Manifest.toml |
found by |
| X |
[A, ...] |
empty |
has B as an indirect dependency |
Strategy 1 |
| Y |
[...] |
has B's preferences |
has B as an indirect dependency |
Strategy 2 |
| Z |
[B] |
has B's preferences |
has B |
Strategy 3 |
Analysis
As I discussed in #37595 (comment), I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package A depends on B is (usually) an implementation detail. Package A's author may silently drop B from the dependency when bumping v1.1 to v1.2. Then, after Pkg.update, Strategy 1 would pick up project Y as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state of Manifest.toml since it is a large opaque file to the users and it is often not checked in to the version control system.
Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the compat entry. For example, the package can add the compat bound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package in Project.toml that they added just for configuring a transitive dependency.
Alternative: shallow-merge all preference tables?
It's also conceivable to aggressively combine preference tables for a given package using merge(dicts...). That is to say, given
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 1
b = 2
and
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 10
c = 30
we'd have merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2)) (i.e., Dict("a" => 1, "b" => 2, "c" => 30)).
Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences] # note `.preferences` suffix
a = 1
b = 2
and
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences]
a = 10
c = 30
As long as the specification is clearly documented, the package authors can use the appropriate behavior.
Opinion
I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.
Appendix: Current implementation
The entry point for the precompilation cache manager is get_preferences_hash
|
function uuid_in_environment(project_file::String, uuid::UUID, cache::TOMLCache) |
|
# First, check to see if we're looking for the environment itself |
|
proj_uuid = get(parsed_toml(cache, project_file), "uuid", nothing) |
|
if proj_uuid !== nothing && UUID(proj_uuid) == uuid |
|
return true |
|
end |
|
|
|
# Check to see if there's a Manifest.toml associated with this project |
|
manifest_file = project_file_manifest_path(project_file, cache) |
|
if manifest_file === nothing |
|
return false |
|
end |
|
manifest = parsed_toml(cache, manifest_file) |
|
for (dep_name, entries) in manifest |
|
for entry in entries |
|
entry_uuid = get(entry, "uuid", nothing)::Union{String, Nothing} |
|
if uuid !== nothing && UUID(entry_uuid) == uuid |
|
return true |
|
end |
|
end |
|
end |
|
# If all else fails, return `false` |
|
return false |
|
end |
|
# Find the Project.toml that we should load/store to for Preferences |
|
function get_preferences_project_path(uuid::UUID, cache::TOMLCache = TOMLCache()) |
|
for env in load_path() |
|
project_file = env_project_file(env) |
|
if !isa(project_file, String) |
|
continue |
|
end |
|
if uuid_in_environment(project_file, uuid, cache) |
|
return project_file |
|
end |
|
end |
|
return nothing |
|
end |
|
|
|
function get_preferences(uuid::UUID, cache::TOMLCache = TOMLCache(); |
|
prefs_key::String = "compile-preferences") |
|
project_path = get_preferences_project_path(uuid, cache) |
|
if project_path !== nothing |
|
preferences = get(parsed_toml(cache, project_path), prefs_key, Dict{String,Any}()) |
|
if haskey(preferences, string(uuid)) |
|
return preferences[string(uuid)] |
|
end |
|
end |
|
# Fall back to default value of "no preferences". |
|
return Dict{String,Any}() |
|
end |
|
get_preferences_hash(uuid::UUID, cache::TOMLCache = TOMLCache()) = UInt64(hash(get_preferences(uuid, cache))) |
tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of
Manifest.tomlfiles?Continuing the discussion in #37595 (comment), I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.
(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)
cc @fredrikekre @KristofferC
What is the motivation of compile-time preference?
Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.
I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU JuliaLang/Pkg.jl#977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time
@eval. FFTW's deps/build.jl uses a text file~/.julia/prefs/FFTWto switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constantNANSAFE_MODE_ENABLEDfor adding debugging instructions.I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (JuliaLang/Pkg.jl#458 and JuliaLang/Juleps#38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in
Project.tomlthat is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences and (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.
Strategies
There are various ways to lookup preferences of stacked environments (i.e.,
Base.load_path()). To start the conversation, I discuss following threee strategies:Strategy 1: First package hit in
Manifest.tomlfiles (current implementation as of #37595)The current strategy for finding the preference for a package is to walk through
load_path()one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.Strategy 2: First preference hit in
Project.tomlfilesSearch
Project.tomlfiles inload_path()and find the firstProject.tomlfile with the preference of the target package.Strategy 3: First package hit in
Project.tomlfilesSearch
Project.tomlfiles inload_path()and find the firstProject.tomlfile with the target package.Example
To illustrate the difference between these strategies, consider the following environment stack (i.e.,
Base.load_path() == [X, Y, Z])X:Project.tomlhas packageAwhich has packageBas a dependency (i.e.,Bis inManifest.tomlbut not inProject.toml).Package.tomlhas no compile-preferences table.Y:Project.tomlhas the compile-preferences table forB. However,Project.toml'sdepstable does not containB.Z:Project.tomlhas the compile-preferences table forB.Project.tomlincludesBindeps; i.e., the user ranpkg> add Bwhile activatingZ.Strategy 1 finds the preferences for
BinX(i.e., empty). Strategy 2 finds the preferences forBinY. Strategy 3 finds the preferences forBinZ.To summarize:
depscompile-preferencesManifest.toml[A, ...]Bas an indirect dependency[...]B's preferencesBas an indirect dependency[B]B's preferencesBAnalysis
As I discussed in #37595 (comment), I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package
Adepends onBis (usually) an implementation detail. PackageA's author may silently dropBfrom the dependency when bumping v1.1 to v1.2. Then, afterPkg.update, Strategy 1 would pick up projectYas the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state ofManifest.tomlsince it is a large opaque file to the users and it is often not checked in to the version control system.Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the
compatentry. For example, the package can add thecompatbound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package inProject.tomlthat they added just for configuring a transitive dependency.Alternative: shallow-merge all preference tables?
It's also conceivable to aggressively combine preference tables for a given package using
merge(dicts...). That is to say, givenand
we'd have
merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2))(i.e.,Dict("a" => 1, "b" => 2, "c" => 30)).Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:
and
As long as the specification is clearly documented, the package authors can use the appropriate behavior.
Opinion
I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.
Appendix: Current implementation
The entry point for the precompilation cache manager is
get_preferences_hashjulia/base/loading.jl
Lines 325 to 348 in 6596f95
julia/base/loading.jl
Lines 1458 to 1484 in 6596f95