Skip to content

Bugs when Merging Configurations#2694

Closed
citibeth wants to merge 2 commits intospack:developfrom
citibeth:efischer/161230-FixConfigAliasing
Closed

Bugs when Merging Configurations#2694
citibeth wants to merge 2 commits intospack:developfrom
citibeth:efischer/161230-FixConfigAliasing

Conversation

@citibeth
Copy link
Copy Markdown
Member

The function _merge_yaml() was buggy because objects within two separate dict trees could be aliased together. In my example case, I had id(dest['packages']['all']['versions']) == id(dest['packages']['ibmisc']['versions']) because both were initially equal to the empty list []. When _merge_yaml() appended to one list, it appeneded to both --- producing surprising (and wrong) results.

@tgamblin Can you please check this PR to see if there are any other places that need a similar fix?

To see the bug in action without this patch, use the following packages.yaml files:

etc/spack/defaults/packages.yaml:
-----------------------------------
packages:
  ibmisc:

  all:
    compiler: [gcc, intel, pgi, clang, xl, nag]
    providers:
      mpi: [openmpi, mpich]
      blas: [openblas]
      lapack: [openblas]
      pil: [py-pillow]
~/.spack/packages.yaml
------------------------
packages:
  ibmisc:
    version: [develop]

Then try spack config get packages, which yields surprising results:

$ spack config get packages
packages:
  all:
    buildable: true
    compiler:
    - gcc
    - intel
    - pgi
    - clang
    - xl
    - nag
    modules: {}
    paths: {}
    providers:
      blas:
      - openblas
      lapack:
      - openblas
      mpi:
      - openmpi
      - mpich
      pil:
      - py-pillow
    version:
    - develop     # <<< This is wrong... should be []
  ibmisc:
    buildable: true
    compiler: []
    modules: {}
    paths: {}
    providers: {}
    version:
    - develop

This will cause the @develop version to be (wrongly) used on any package where it's possible; for example on hypre:

$ spack spec hypre
hypre@develop%gcc@4.9.3+internal-superlu+shared arch=linux-centos7-x86_64
    ^openblas@0.2.19%gcc@4.9.3~openmp+pic+shared arch=linux-centos7-x86_64
    ^openmpi@1.10.1%gcc@4.9.3~java~mxm~pmi~psm~psm2~slurm~sqlite3~thread_multiple~tm~verbs+vt arch=linux-centos7-x86_64
        ^hwloc@1.11.4%gcc@4.9.3 arch=linux-centos7-x86_64
            ^libpciaccess@0.13.4%gcc@4.9.3 arch=linux-centos7-x86_64
                ^libtool@2.4.6%gcc@4.9.3 arch=linux-centos7-x86_64
                    ^m4@1.4.17%gcc@4.9.3+sigsegv arch=linux-centos7-x86_64
                        ^libsigsegv@2.10%gcc@4.9.3 arch=linux-centos7-x86_64
                ^pkg-config@0.29.1%gcc@4.9.3+internal_glib arch=linux-centos7-x86_64
                ^util-macros@1.19.0%gcc@4.9.3 arch=linux-centos7-x86_64

@citibeth citibeth requested a review from tgamblin December 31, 2016 01:56
@citibeth citibeth mentioned this pull request Dec 31, 2016
@davydden
Copy link
Copy Markdown
Member

davydden commented Jan 3, 2017

can you add a regression test to make sure the bug won't re-appear when someone tried to refactor the code next time?

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 3, 2017

I think that's a good suggestion, but I'm not interested in adding the tests myself. I found and fixed a bug in someone else's code, I would rather not get more involved if I don't have to. I'm also hoping the author of the code will check over my fix, maybe there are other places nearby where the same problem occurs that I missed.

@davydden
Copy link
Copy Markdown
Member

davydden commented Jan 3, 2017

but I'm not interested in adding the tests myself. I found and fixed a bug in someone else's code, I would rather not get more involved if I don't have to

fair enough.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Jan 3, 2017

@citibeth: even a small regression test for the case you describe towards the beginning of the PR would help out other maintainers tremendously. The test could go in tests/config.py but doesn't have to use all the mock configs in there. You could just add a test_aliasing() function that calls merge_yaml with literal arguments known to produce a bad result and asserts the ids aren't the same in the result.

@citibeth citibeth changed the title Fix aliasing issues when merging configurations Bugs when Merging Configurations Jan 3, 2017
@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 3, 2017

@tgamblin I found a more serious problem merging configurations, I'm looking for your advice on it. The issue is with merging variants:

  1. The YAML grammar specifies they can be either lists or strings. 99% of users will probably unknowingly choose strings, but expect them to act like lists. For example, a user would do:

    mypackage:
       variants: +gui+python
    

    Suppose the user puts the following in a higher-precedence packages.yaml file:

    mypackage:
       variants: +mpi
    

    She will expect that the final default spec will be mypackage+gui+python+mpi; and will be VERY surprised to learn that it's actually +mpi.

  2. The ONLY way (that I know of) to avoid the problem in (1) is to make sure you always specify your variants as lists. You must use brackets and commas! For example, the following common mistake is WRONG:

    icebin:
        variants: [+gridgen +python +everytrace]
    

    Unfortunately, another trap to fall into.

  3. Supposing you've gotten past (1) and (2), now one ends up with real questions on HOW the overriding/merging is done:
    a) The Spack docs tell us: "For list-valued settings, Spack prepends higher-precedence settings to lower-precedence settings." Of course lists of variants are not really lists, they are more like sets or dicts. Supposing I try to concretize mypackage~mpi+mpi, what will the value of the mpi variant be in the end? Is the prepending rule wrong here, or is it right?
    b) Suppose a variant +var is set at a lower precedence, how do I turn it into ~var in an overriding packages.yaml file?
    c) Suppose a variant ~var is set at a lower precedence, how do I override it into +var in an overriding packages.yaml file?
    d) Suppose the variant var is set with ~var or +var, how do I clear it in the overriding file?
    e) Suppsoe the variant var is not mentioned, how do I set it to +var or ~var in the overriding file?
    Unfortunately, (e) is the only one of these questions with a clear answer at this point.

  4. How does any of this work for non-boolean variants?

I'm afraid that configuration merging has not been thought all the way through, at least as far as variants go. Do you have any suggestions / recommendations on finishing the task? For example:

  1. Should we eliminate the option to provide variants as strings, since that is such a booby trap? Or should we allow all sorts of stuff, and then post-process it (meaning, process after the YAML parser) into the list/set/dict that we need?

  2. What are the intended answers for questions (3) above?

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 3, 2017

So for now, my package.yaml files look like this:

# Updates to <gissversions> to support two-way coupling
packages:
    boost:
        # icebin needs these variants; Just set them for everyone to
        # keep hashes stable.
        variants: [+filesystem,+date_time,+mpi]

    ibmisc:
        # icebin needs these variants
        variants: [+proj,+blitz,+netcdf,+boost,+udunits2,+python]

    pism:
        # icebin needs these variants
        variants: [~python]

    modele:
        variants: [+everytrace,+pnetcdf,+icebin,+mpi,+model]

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 3, 2017

@tgamblin Given the issues uncovered here, I would recommend that we just consider configuration merging NOT ready for the 1.0 release. Sure, this PR can be merged, and it would do some good. But I believe we need more thought and more testing before it's really ready.

@adamjstewart
Copy link
Copy Markdown
Member

I discovered a new bug in the way packages.yaml is parsed. With no packages.yaml:

$ spack spec hdf5
...
hdf5@1.10.0-patch1%clang@8.0.0-apple+cxx~debug+fortran+mpi+pic+shared~szip~threadsafe arch=darwin-sierra-x86_64 
    ^openmpi@2.0.1%clang@8.0.0-apple~java~mxm~pmi~psm~psm2~slurm~sqlite3~thread_multiple~tm~verbs+vt arch=darwin-sierra-x86_64 
        ^hwloc@1.11.5%clang@8.0.0-apple arch=darwin-sierra-x86_64 
            ^libpciaccess@0.13.4%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^libtool@2.4.6%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                    ^m4@1.4.18%clang@8.0.0-apple+sigsegv arch=darwin-sierra-x86_64 
                        ^libsigsegv@2.10%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^pkg-config@0.29.1%clang@8.0.0-apple+internal_glib arch=darwin-sierra-x86_64 
                ^util-macros@1.19.0%clang@8.0.0-apple arch=darwin-sierra-x86_64 
    ^zlib@1.2.10%clang@8.0.0-apple+pic arch=darwin-sierra-x86_64 

With the following packages.yaml:

packages:
  all:
    variants: -mpi
$ spack spec hdf5
...
hdf5@1.10.0-patch1%clang@8.0.0-apple+cxx~debug+fortran~mpi+pic+shared~szip~threadsafe arch=darwin-sierra-x86_64 
    ^zlib@1.2.10%clang@8.0.0-apple+pic arch=darwin-sierra-x86_64 

With the following packages.yaml:

packages:
  all:
    variants: -mpi
  hdf5:
    variants: +szip
$ spack spec hdf5
...
hdf5@1.10.0-patch1%clang@8.0.0-apple+cxx~debug+fortran+mpi+pic+shared+szip~threadsafe arch=darwin-sierra-x86_64 
    ^openmpi@2.0.1%clang@8.0.0-apple~java~mxm~pmi~psm~psm2~slurm~sqlite3~thread_multiple~tm~verbs+vt arch=darwin-sierra-x86_64 
        ^hwloc@1.11.5%clang@8.0.0-apple arch=darwin-sierra-x86_64 
            ^libpciaccess@0.13.4%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^libtool@2.4.6%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                    ^m4@1.4.18%clang@8.0.0-apple+sigsegv arch=darwin-sierra-x86_64 
                        ^libsigsegv@2.10%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^pkg-config@0.29.1%clang@8.0.0-apple+internal_glib arch=darwin-sierra-x86_64 
                ^util-macros@1.19.0%clang@8.0.0-apple arch=darwin-sierra-x86_64 
    ^szip@2.1%clang@8.0.0-apple arch=darwin-sierra-x86_64 
    ^zlib@1.2.10%clang@8.0.0-apple+pic arch=darwin-sierra-x86_64 

That's weird. I would've expected the variants to be merged instead of overwritten.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 20, 2017 via email

@adamjstewart
Copy link
Copy Markdown
Member

@citibeth Unfortunately that doesn't work either. On develop, with a packages.yaml that looks like:

packages:
  all:
    variants: [-mpi]
  hdf5:
    variants: [+szip]

I'm still seeing:

$ spack spec hdf5
...
hdf5@1.10.0-patch1%clang@8.0.0-apple+cxx~debug+fortran+mpi+pic+shared+szip~threadsafe arch=darwin-sierra-x86_64 
    ^openmpi@2.0.1%clang@8.0.0-apple~java~mxm~pmi~psm~psm2~slurm~sqlite3~thread_multiple~tm~verbs+vt arch=darwin-sierra-x86_64 
        ^hwloc@1.11.5%clang@8.0.0-apple arch=darwin-sierra-x86_64 
            ^libpciaccess@0.13.4%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^libtool@2.4.6%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                    ^m4@1.4.18%clang@8.0.0-apple+sigsegv arch=darwin-sierra-x86_64 
                        ^libsigsegv@2.10%clang@8.0.0-apple arch=darwin-sierra-x86_64 
                ^pkg-config@0.29.1%clang@8.0.0-apple+internal_glib arch=darwin-sierra-x86_64 
                ^util-macros@1.19.0%clang@8.0.0-apple arch=darwin-sierra-x86_64 
    ^szip@2.1%clang@8.0.0-apple arch=darwin-sierra-x86_64 
    ^zlib@1.2.10%clang@8.0.0-apple+pic arch=darwin-sierra-x86_64 

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Jan 20, 2017 via email

@citibeth
Copy link
Copy Markdown
Member Author

Fixed in #7959

@citibeth citibeth closed this May 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants