module files: restricted token expansion + case sensitivity by alalazo · Pull Request #5474 · spack/spack

alalazo · 2017-09-26T06:24:30Z

closes #2884
closes #4684

This PR restricts the token that can be expanded in the module file naming scheme and in the name of an environment variable to just PACKAGE, VERSION, COMPILER, COMPILERNAME, COMPILERVER and ARCHITECTURE (as discussed in #2884). It does so extending Spec.format with an optional parameter that contains a list of transformation functions (one for each token, the default being the identity function).

This new mechanism is then exploited to solve #4684. This PR changes the behavior for the expansion of tokens in environment variable names so that it respects the case sensitivity of the literal parts (token expanded in that context are still made uppercase).

PR in action

With a modules.yaml that looks like:

modules:
  tcl:
    naming_scheme: '${PACKAGE}/${VERSION}-${COMPILERNAME}-${COMPILERVER}-${OPTIONS}'

where OPTIONS is an invalid token, we get a warning and the module file is not written:

$ spack install libszip
==> Installing libszip
==> Using cached archive: /home/mculpo/PycharmProjects/spack/var/spack/cache/libszip/libszip-2.1.1.tar.gz
==> Staging archive: /home/mculpo/PycharmProjects/spack/var/spack/stage/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5/szip-2.1.1.tar.gz
==> Created stage in /home/mculpo/PycharmProjects/spack/var/spack/stage/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5
==> No patches needed for libszip
==> Building libszip [AutotoolsPackage]
==> Executing phase: 'autoreconf'
==> Executing phase: 'configure'
==> Executing phase: 'build'
==> Executing phase: 'install'
==> Warning: cannot perform the requested write operation on module files [token OPTIONS cannot be part of the module naming scheme]
==> Successfully installed libszip
  Fetch: 0.01s.  Build: 5.58s.  Total: 5.59s.
[+] /home/mculpo/PycharmProjects/spack/opt/spack/linux-ubuntu14.04-x86_64/gcc-7.2.0/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5

The same holds true for:

$ spack module refresh -m tcl libszip
==> You are about to regenerate tcl module files for:

-- linux-ubuntu14.04-x86_64 / gcc@4.8 ---------------------------
qabk3nm libszip@2.1.1

-- linux-ubuntu14.04-x86_64 / gcc@7.2.0 -------------------------
bg2wewq libszip@2.1.1

==> Do you want to proceed? [y/n] y
==> Error: token OPTIONS cannot be part of the module naming scheme

There are a few lines of comments in the commit messages, so in case we should rebase and merge this (instead of squashing it)?

ax3l

thanks a lot!
I currently have no time to RT test it, but the code looks good!

scheibelp

Main issue: I don't necessarily need this to change but I did want to discuss it: this seems like it's more complex than it needs to be. The logic you added supports more-general use cases like special-case logic for expressing spec parameters as strings (by customizing the functions in the dictionary). However, for all the added logic and testing in this PR, I think a simple whitelist of strings (along with a customized error message for tokens outside that list) would serve just as well. Was there a reason you went for a solution with more functionality (and complexity)?

Also: I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

scheibelp · 2017-09-30T01:15:33Z

-                        write(fmt % str(self.variants), '+')
+                    # Not all concretized specs have variants, but we still
+                    # want to run the transformation function on the variants
+                    if hasattr(self, 'variants'):


If bool(self.variants) == False then str(variants) == "" - are you saying that the transformation function may still want to output something when given ""? What makes this special vs. Spec.compiler?

Are you saying that the transformation function may still want to output something when given ""?

Yes. The point for any hook using Spec.format is that in the context of a concretized spec, there's always a compiler but not always a set of variants (for instance libszip has no variants). Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

I think the code in Spec.format can probably benefit from a refactoring, but here I just tried to do the minimal changes needed to fix the two issues i.e.:

make uppercase only the named tokens in the environment variable names

report any use of invalid tokens

there's always a compiler but not always a set of variants (for instance libszip has no variants)

That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.

Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.

Furthermore, regarding a request I made previously:

I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.

In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

That's actually a fair point. I restored the previous check in Spec.format, and scan for invalid tokens before calling it in the spack.modules subpackage. The solution is slightly different from what you propose but is, I think, along the same line.

Are there other side effects you anticipate wanting?

Maybe substitute a token with a default string if nothing is there? Right now a format like:

'${PACKAGE}-${OPTIONS}-${VERSION}'

for e.g. libszip@2.1.1 will return:

'libszip--2.1.1'

and there's no way to get the ${OPTIONS} part evaluated to a default, like:

'libszip-None-2.1.1'

Anyhow, just thinking ahead for possible use cases: I don't need this functionality right now.

scheibelp · 2017-09-30T01:21:47Z

+        #
+        # The expression `lambda: lambda x: x` is the shortest and neatest
+        # way to have a factory of identity functions in python
+        tr = collections.defaultdict(lambda: lambda x: x)


(this applies if you disagree that a whitelist would be more suitable)

I'd prefer this had a more descriptive name like token_transforms and right after named_str is defined below do something like:

token_transform = token_transforms.get(named_str)

Also I think doing the following would avoid the defaultdict with lambda: lambda x: x:

token_transform = token_transforms.get(named_str, lambda x: x)

It won't be a problem to change the naming, if if you strongly disagree with it. However: here the scope of the function is relatively contained, and making this name longer will give a lot more line breaks due to pep8. I tried having a longer name along the line of what you suggest, but in the end I went back to this short one because I was not convinced there was any readability improvement.

Also I think doing the following would avoid the defaultdict

Yes. That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

making this name longer will give a lot more line breaks due to pep8

It would if you edit it like:

write(fmt % token_transforms[named_str](self.name), '@')

however I am suggesting that you edit it like:

named_str = named_str.upper() transform = token_transforms[named_str]

and later on

write(fmt % transform(self.name), '@')

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

I think

token_transform = token_transforms.get(named_str, lambda x: x)

is clearer than

tr = collections.defaultdict(lambda: lambda x: x)

the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).

Done - to be honest, despite I see no harm in this transformation (tr -> token_transforms, defaultdict - > dict.get(value, default)) I also see no gain.

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

Sure. Another possibility is to use shorter names for variables whose lifetime is also short - if what they mean can be inferred by the context.

(for example you included an explanatory comment of what it was doing).

I added the comment because I was using lambdas, and generally people seem to be wary of them. I think if somebody doesn't need a comment to get that:

lambda x: x

is the identity function, then he won't need a comment to get that:

lambda: lambda x: x

is a factory of identity functions. Just my opinion though.

scheibelp · 2017-09-30T01:30:20Z

            except AttributeError:
                pass
-            x.name = str(x.name).replace('-', '_').upper()
+            x.name = str(x.name).replace('-', '_')


This appears to be the essential fix for #4684

This is only half of the fix. Before everything was made uppercase, now I am making uppercase only the named tokens in the environment variable names. This is to ensure backward compatibility for the most common cases, like:

modules: lmod: all: environment: set: '${PACKAGE}_ROOT': '${PREFIX}'

The main reason for that is that we have a lot of users relying on those variable being uppercase in their scripts. The argument is quite similar to the one made for --run-tests in #5132

I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).

scheibelp

I appreciate the responses. I still have a couple requests. Let me know if they are unclear or seem wrong.

scheibelp · 2017-10-02T19:07:22Z

+        #
+        # The expression `lambda: lambda x: x` is the shortest and neatest
+        # way to have a factory of identity functions in python
+        tr = collections.defaultdict(lambda: lambda x: x)


making this name longer will give a lot more line breaks due to pep8

It would if you edit it like:

write(fmt % token_transforms[named_str](self.name), '@')

however I am suggesting that you edit it like:

named_str = named_str.upper() transform = token_transforms[named_str]

and later on

write(fmt % transform(self.name), '@')

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

I think

token_transform = token_transforms.get(named_str, lambda x: x)

is clearer than

tr = collections.defaultdict(lambda: lambda x: x)

the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).

scheibelp · 2017-10-02T20:06:57Z

-                        write(fmt % str(self.variants), '+')
+                    # Not all concretized specs have variants, but we still
+                    # want to run the transformation function on the variants
+                    if hasattr(self, 'variants'):


there's always a compiler but not always a set of variants (for instance libszip has no variants)

That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.

Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.

Furthermore, regarding a request I made previously:

I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.

scheibelp · 2017-10-02T20:20:38Z

            except AttributeError:
                pass
-            x.name = str(x.name).replace('-', '_').upper()
+            x.name = str(x.name).replace('-', '_')


I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).

closes spack#2884 In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Anyhow not all the tokens that are allowed in `Spec.format` make sense in these context, hence spack#2848. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected.

closes spack#4684 The names of environment variables set / modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restrict the uppercase transformation in variable names only to `Spec.format` tokens.

The previous implementation could lead to a behavior of `Spec.format` that depended on the evaluation context. To avoid this, we have moved all the checks on tokens to the `spack.modules` subpackage. This commit also fixes a bug in the substitution of `${PACKAGE}` (where `self.name` is used instead of `name`), and changes naming and type of variables according to the review comments.

alalazo · 2017-10-03T20:23:02Z

@scheibelp Le me know if the changes are fine, or if further modifications are needed.

scheibelp · 2017-10-04T01:08:21Z

Le me know if the changes are fine, or if further modifications are needed.

Will do! I should have a chance to look at this by tomorrow.

scheibelp · 2017-10-05T01:14:23Z

Thanks for all the edits!

closes spack#2884 closes spack#4684 In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Not all the tokens that are allowed in `Spec.format` make sense in module file generation. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected. Additionally, the names of environment variables set/modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restricts the uppercase transformation in variable names to `Spec.format` tokens. Conflicts: lib/spack/spack/spec.py

…) (#114) In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Not all the tokens that are allowed in `Spec.format` make sense in module file generation. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected. Additionally, the names of environment variables set/modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restricts the uppercase transformation in variable names to `Spec.format` tokens. Conflicts: lib/spack/spack/spec.py

alalazo added bug Something isn't working modules labels Sep 26, 2017

alalazo requested review from ax3l and hartzell September 26, 2017 06:24

ax3l approved these changes Sep 26, 2017

View reviewed changes

alalazo requested review from scheibelp and tgamblin September 29, 2017 14:20

scheibelp reviewed Sep 30, 2017

View reviewed changes

scheibelp requested changes Oct 2, 2017

View reviewed changes

alalazo added 3 commits October 3, 2017 18:31

alalazo force-pushed the fixes/token_substitution_and_case_sensitivity branch from 2956a78 to 02d2d80 Compare October 3, 2017 19:10

scheibelp approved these changes Oct 5, 2017

View reviewed changes

scheibelp merged commit 3556eaa into spack:develop Oct 5, 2017

alalazo deleted the fixes/token_substitution_and_case_sensitivity branch October 5, 2017 04:10

Conversation

alalazo commented Sep 26, 2017

PR in action

Uh oh!

ax3l left a comment

Choose a reason for hiding this comment

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalazo Oct 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalazo commented Oct 3, 2017

Uh oh!

scheibelp commented Oct 4, 2017

Uh oh!

scheibelp commented Oct 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alalazo Oct 3, 2017 •

edited

Loading