module files: restricted token expansion + case sensitivity#5474
Conversation
ax3l
left a comment
There was a problem hiding this comment.
thanks a lot!
I currently have no time to RT test it, but the code looks good!
scheibelp
left a comment
There was a problem hiding this comment.
Main issue: I don't necessarily need this to change but I did want to discuss it: this seems like it's more complex than it needs to be. The logic you added supports more-general use cases like special-case logic for expressing spec parameters as strings (by customizing the functions in the dictionary). However, for all the added logic and testing in this PR, I think a simple whitelist of strings (along with a customized error message for tokens outside that list) would serve just as well. Was there a reason you went for a solution with more functionality (and complexity)?
Also: I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).
| write(fmt % str(self.variants), '+') | ||
| # Not all concretized specs have variants, but we still | ||
| # want to run the transformation function on the variants | ||
| if hasattr(self, 'variants'): |
There was a problem hiding this comment.
If bool(self.variants) == False then str(variants) == "" - are you saying that the transformation function may still want to output something when given ""? What makes this special vs. Spec.compiler?
There was a problem hiding this comment.
Are you saying that the transformation function may still want to output something when given ""?
Yes. The point for any hook using Spec.format is that in the context of a concretized spec, there's always a compiler but not always a set of variants (for instance libszip has no variants). Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.
I think the code in Spec.format can probably benefit from a refactoring, but here I just tried to do the minimal changes needed to fix the two issues i.e.:
- make uppercase only the named tokens in the environment variable names
- report any use of invalid tokens
There was a problem hiding this comment.
there's always a compiler but not always a set of variants (for instance libszip has no variants)
That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.
Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.
Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.
Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.
Furthermore, regarding a request I made previously:
I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).
If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.
There was a problem hiding this comment.
In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.
That's actually a fair point. I restored the previous check in Spec.format, and scan for invalid tokens before calling it in the spack.modules subpackage. The solution is slightly different from what you propose but is, I think, along the same line.
Are there other side effects you anticipate wanting?
Maybe substitute a token with a default string if nothing is there? Right now a format like:
'${PACKAGE}-${OPTIONS}-${VERSION}'for e.g. libszip@2.1.1 will return:
'libszip--2.1.1'and there's no way to get the ${OPTIONS} part evaluated to a default, like:
'libszip-None-2.1.1'Anyhow, just thinking ahead for possible use cases: I don't need this functionality right now.
| # | ||
| # The expression `lambda: lambda x: x` is the shortest and neatest | ||
| # way to have a factory of identity functions in python | ||
| tr = collections.defaultdict(lambda: lambda x: x) |
There was a problem hiding this comment.
(this applies if you disagree that a whitelist would be more suitable)
I'd prefer this had a more descriptive name like token_transforms and right after named_str is defined below do something like:
token_transform = token_transforms.get(named_str)
Also I think doing the following would avoid the defaultdict with lambda: lambda x: x:
token_transform = token_transforms.get(named_str, lambda x: x)
There was a problem hiding this comment.
It won't be a problem to change the naming, if if you strongly disagree with it. However: here the scope of the function is relatively contained, and making this name longer will give a lot more line breaks due to pep8. I tried having a longer name along the line of what you suggest, but in the end I went back to this short one because I was not convinced there was any readability improvement.
Also I think doing the following would avoid the defaultdict
Yes. That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?
There was a problem hiding this comment.
making this name longer will give a lot more line breaks due to pep8
It would if you edit it like:
write(fmt % token_transforms[named_str](self.name), '@')
however I am suggesting that you edit it like:
named_str = named_str.upper()
transform = token_transforms[named_str]
and later on
write(fmt % transform(self.name), '@')
generally I think line breaks can be avoided by adding lines (e.g. defining variables)
That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?
I think
token_transform = token_transforms.get(named_str, lambda x: x)
is clearer than
tr = collections.defaultdict(lambda: lambda x: x)
the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).
There was a problem hiding this comment.
Done - to be honest, despite I see no harm in this transformation (tr -> token_transforms, defaultdict - > dict.get(value, default)) I also see no gain.
generally I think line breaks can be avoided by adding lines (e.g. defining variables)
Sure. Another possibility is to use shorter names for variables whose lifetime is also short - if what they mean can be inferred by the context.
(for example you included an explanatory comment of what it was doing).
I added the comment because I was using lambdas, and generally people seem to be wary of them. I think if somebody doesn't need a comment to get that:
lambda x: xis the identity function, then he won't need a comment to get that:
lambda: lambda x: xis a factory of identity functions. Just my opinion though.
| except AttributeError: | ||
| pass | ||
| x.name = str(x.name).replace('-', '_').upper() | ||
| x.name = str(x.name).replace('-', '_') |
There was a problem hiding this comment.
This is only half of the fix. Before everything was made uppercase, now I am making uppercase only the named tokens in the environment variable names. This is to ensure backward compatibility for the most common cases, like:
modules:
lmod:
all:
environment:
set:
'${PACKAGE}_ROOT': '${PREFIX}'The main reason for that is that we have a lot of users relying on those variable being uppercase in their scripts. The argument is quite similar to the one made for --run-tests in #5132
There was a problem hiding this comment.
I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).
scheibelp
left a comment
There was a problem hiding this comment.
I appreciate the responses. I still have a couple requests. Let me know if they are unclear or seem wrong.
| # | ||
| # The expression `lambda: lambda x: x` is the shortest and neatest | ||
| # way to have a factory of identity functions in python | ||
| tr = collections.defaultdict(lambda: lambda x: x) |
There was a problem hiding this comment.
making this name longer will give a lot more line breaks due to pep8
It would if you edit it like:
write(fmt % token_transforms[named_str](self.name), '@')
however I am suggesting that you edit it like:
named_str = named_str.upper()
transform = token_transforms[named_str]
and later on
write(fmt % transform(self.name), '@')
generally I think line breaks can be avoided by adding lines (e.g. defining variables)
That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?
I think
token_transform = token_transforms.get(named_str, lambda x: x)
is clearer than
tr = collections.defaultdict(lambda: lambda x: x)
the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).
| write(fmt % str(self.variants), '+') | ||
| # Not all concretized specs have variants, but we still | ||
| # want to run the transformation function on the variants | ||
| if hasattr(self, 'variants'): |
There was a problem hiding this comment.
there's always a compiler but not always a set of variants (for instance libszip has no variants)
That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.
Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.
Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.
Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.
Furthermore, regarding a request I made previously:
I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).
If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.
| except AttributeError: | ||
| pass | ||
| x.name = str(x.name).replace('-', '_').upper() | ||
| x.name = str(x.name).replace('-', '_') |
There was a problem hiding this comment.
I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).
closes spack#2884 In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Anyhow not all the tokens that are allowed in `Spec.format` make sense in these context, hence spack#2848. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected.
closes spack#4684 The names of environment variables set / modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restrict the uppercase transformation in variable names only to `Spec.format` tokens.
The previous implementation could lead to a behavior of `Spec.format`
that depended on the evaluation context. To avoid this, we have moved
all the checks on tokens to the `spack.modules` subpackage.
This commit also fixes a bug in the substitution of `${PACKAGE}` (where
`self.name` is used instead of `name`), and changes naming and type
of variables according to the review comments.
2956a78 to
02d2d80
Compare
|
@scheibelp Le me know if the changes are fine, or if further modifications are needed. |
Will do! I should have a chance to look at this by tomorrow. |
|
Thanks for all the edits! |
closes spack#2884 closes spack#4684 In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Not all the tokens that are allowed in `Spec.format` make sense in module file generation. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected. Additionally, the names of environment variables set/modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restricts the uppercase transformation in variable names to `Spec.format` tokens. Conflicts: lib/spack/spack/spec.py
…) (#114) In spack#1848 we decided to use `Spec.format` to expand certain tokens in the module file naming scheme or in the environment variable name. Not all the tokens that are allowed in `Spec.format` make sense in module file generation. This PR restricts the set of tokens that can be used, and adds tests to check that the intended behavior is respected. Additionally, the names of environment variables set/modified by module files were, up to now, always uppercase. There are packages though that require case sensitive variable names to honor certain behaviors (e.g. OpenMPI). This PR restricts the uppercase transformation in variable names to `Spec.format` tokens. Conflicts: lib/spack/spack/spec.py
closes #2884
closes #4684
This PR restricts the token that can be expanded in the module file naming scheme and in the name of an environment variable to just
PACKAGE,VERSION,COMPILER,COMPILERNAME,COMPILERVERandARCHITECTURE(as discussed in #2884). It does so extendingSpec.formatwith an optional parameter that contains a list of transformation functions (one for each token, the default being the identity function).This new mechanism is then exploited to solve #4684. This PR changes the behavior for the expansion of tokens in environment variable names so that it respects the case sensitivity of the literal parts (token expanded in that context are still made uppercase).
PR in action
With a
modules.yamlthat looks like:where
OPTIONSis an invalid token, we get a warning and the module file is not written:The same holds true for:
There are a few lines of comments in the commit messages, so in case we should rebase and merge this (instead of squashing it)?