Skip to content

module files: restricted token expansion + case sensitivity#5474

Merged
scheibelp merged 3 commits intospack:developfrom
epfl-scitas:fixes/token_substitution_and_case_sensitivity
Oct 5, 2017
Merged

module files: restricted token expansion + case sensitivity#5474
scheibelp merged 3 commits intospack:developfrom
epfl-scitas:fixes/token_substitution_and_case_sensitivity

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Sep 26, 2017

closes #2884
closes #4684

This PR restricts the token that can be expanded in the module file naming scheme and in the name of an environment variable to just PACKAGE, VERSION, COMPILER, COMPILERNAME, COMPILERVER and ARCHITECTURE (as discussed in #2884). It does so extending Spec.format with an optional parameter that contains a list of transformation functions (one for each token, the default being the identity function).

This new mechanism is then exploited to solve #4684. This PR changes the behavior for the expansion of tokens in environment variable names so that it respects the case sensitivity of the literal parts (token expanded in that context are still made uppercase).

PR in action

With a modules.yaml that looks like:

modules:
  tcl:
    naming_scheme: '${PACKAGE}/${VERSION}-${COMPILERNAME}-${COMPILERVER}-${OPTIONS}'

where OPTIONS is an invalid token, we get a warning and the module file is not written:

$ spack install libszip
==> Installing libszip
==> Using cached archive: /home/mculpo/PycharmProjects/spack/var/spack/cache/libszip/libszip-2.1.1.tar.gz
==> Staging archive: /home/mculpo/PycharmProjects/spack/var/spack/stage/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5/szip-2.1.1.tar.gz
==> Created stage in /home/mculpo/PycharmProjects/spack/var/spack/stage/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5
==> No patches needed for libszip
==> Building libszip [AutotoolsPackage]
==> Executing phase: 'autoreconf'
==> Executing phase: 'configure'
==> Executing phase: 'build'
==> Executing phase: 'install'
==> Warning: cannot perform the requested write operation on module files [token OPTIONS cannot be part of the module naming scheme]
==> Successfully installed libszip
  Fetch: 0.01s.  Build: 5.58s.  Total: 5.59s.
[+] /home/mculpo/PycharmProjects/spack/opt/spack/linux-ubuntu14.04-x86_64/gcc-7.2.0/libszip-2.1.1-bg2wewqg2rfxg4rypj7u2lfvmhnxcug5

The same holds true for:

$ spack module refresh -m tcl libszip
==> You are about to regenerate tcl module files for:

-- linux-ubuntu14.04-x86_64 / gcc@4.8 ---------------------------
qabk3nm libszip@2.1.1

-- linux-ubuntu14.04-x86_64 / gcc@7.2.0 -------------------------
bg2wewq libszip@2.1.1

==> Do you want to proceed? [y/n] y
==> Error: token OPTIONS cannot be part of the module naming scheme

There are a few lines of comments in the commit messages, so in case we should rebase and merge this (instead of squashing it)?

@alalazo alalazo added bug Something isn't working modules labels Sep 26, 2017
@alalazo alalazo requested review from ax3l and hartzell September 26, 2017 06:24
Copy link
Copy Markdown
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot!
I currently have no time to RT test it, but the code looks good!

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main issue: I don't necessarily need this to change but I did want to discuss it: this seems like it's more complex than it needs to be. The logic you added supports more-general use cases like special-case logic for expressing spec parameters as strings (by customizing the functions in the dictionary). However, for all the added logic and testing in this PR, I think a simple whitelist of strings (along with a customized error message for tokens outside that list) would serve just as well. Was there a reason you went for a solution with more functionality (and complexity)?

Also: I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

Comment thread lib/spack/spack/spec.py Outdated
write(fmt % str(self.variants), '+')
# Not all concretized specs have variants, but we still
# want to run the transformation function on the variants
if hasattr(self, 'variants'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bool(self.variants) == False then str(variants) == "" - are you saying that the transformation function may still want to output something when given ""? What makes this special vs. Spec.compiler?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the transformation function may still want to output something when given ""?

Yes. The point for any hook using Spec.format is that in the context of a concretized spec, there's always a compiler but not always a set of variants (for instance libszip has no variants). Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

I think the code in Spec.format can probably benefit from a refactoring, but here I just tried to do the minimal changes needed to fix the two issues i.e.:

  • make uppercase only the named tokens in the environment variable names
  • report any use of invalid tokens

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's always a compiler but not always a set of variants (for instance libszip has no variants)

That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.

Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.

Furthermore, regarding a request I made previously:

I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.

Copy link
Copy Markdown
Member Author

@alalazo alalazo Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

That's actually a fair point. I restored the previous check in Spec.format, and scan for invalid tokens before calling it in the spack.modules subpackage. The solution is slightly different from what you propose but is, I think, along the same line.

Are there other side effects you anticipate wanting?

Maybe substitute a token with a default string if nothing is there? Right now a format like:

'${PACKAGE}-${OPTIONS}-${VERSION}'

for e.g. libszip@2.1.1 will return:

'libszip--2.1.1'

and there's no way to get the ${OPTIONS} part evaluated to a default, like:

'libszip-None-2.1.1'

Anyhow, just thinking ahead for possible use cases: I don't need this functionality right now.

Comment thread lib/spack/spack/spec.py Outdated
#
# The expression `lambda: lambda x: x` is the shortest and neatest
# way to have a factory of identity functions in python
tr = collections.defaultdict(lambda: lambda x: x)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this applies if you disagree that a whitelist would be more suitable)

I'd prefer this had a more descriptive name like token_transforms and right after named_str is defined below do something like:

token_transform = token_transforms.get(named_str)

Also I think doing the following would avoid the defaultdict with lambda: lambda x: x:

token_transform = token_transforms.get(named_str, lambda x: x)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be a problem to change the naming, if if you strongly disagree with it. However: here the scope of the function is relatively contained, and making this name longer will give a lot more line breaks due to pep8. I tried having a longer name along the line of what you suggest, but in the end I went back to this short one because I was not convinced there was any readability improvement.

Also I think doing the following would avoid the defaultdict

Yes. That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this name longer will give a lot more line breaks due to pep8

It would if you edit it like:

write(fmt % token_transforms[named_str](self.name), '@')

however I am suggesting that you edit it like:

named_str = named_str.upper()
transform = token_transforms[named_str]

and later on

write(fmt % transform(self.name), '@')

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

I think

token_transform = token_transforms.get(named_str, lambda x: x)

is clearer than

tr = collections.defaultdict(lambda: lambda x: x)

the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - to be honest, despite I see no harm in this transformation (tr -> token_transforms, defaultdict - > dict.get(value, default)) I also see no gain.

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

Sure. Another possibility is to use shorter names for variables whose lifetime is also short - if what they mean can be inferred by the context.

(for example you included an explanatory comment of what it was doing).

I added the comment because I was using lambdas, and generally people seem to be wary of them. I think if somebody doesn't need a comment to get that:

lambda x: x

is the identity function, then he won't need a comment to get that:

lambda: lambda x: x

is a factory of identity functions. Just my opinion though.

except AttributeError:
pass
x.name = str(x.name).replace('-', '_').upper()
x.name = str(x.name).replace('-', '_')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the essential fix for #4684

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only half of the fix. Before everything was made uppercase, now I am making uppercase only the named tokens in the environment variable names. This is to ensure backward compatibility for the most common cases, like:

modules:
  lmod:
    all:
      environment:
        set:
          '${PACKAGE}_ROOT': '${PREFIX}'

The main reason for that is that we have a lot of users relying on those variable being uppercase in their scripts. The argument is quite similar to the one made for --run-tests in #5132

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the responses. I still have a couple requests. Let me know if they are unclear or seem wrong.

Comment thread lib/spack/spack/spec.py Outdated
#
# The expression `lambda: lambda x: x` is the shortest and neatest
# way to have a factory of identity functions in python
tr = collections.defaultdict(lambda: lambda x: x)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this name longer will give a lot more line breaks due to pep8

It would if you edit it like:

write(fmt % token_transforms[named_str](self.name), '@')

however I am suggesting that you edit it like:

named_str = named_str.upper()
transform = token_transforms[named_str]

and later on

write(fmt % transform(self.name), '@')

generally I think line breaks can be avoided by adding lines (e.g. defining variables)

That would avoid defaultdict. Not that it is a big deal, but why do you think this is preferable?

I think

token_transform = token_transforms.get(named_str, lambda x: x)

is clearer than

tr = collections.defaultdict(lambda: lambda x: x)

the former IMO is self-explanatory. The latter is confusing IMO (for example you included an explanatory comment of what it was doing).

Comment thread lib/spack/spack/spec.py Outdated
write(fmt % str(self.variants), '+')
# Not all concretized specs have variants, but we still
# want to run the transformation function on the variants
if hasattr(self, 'variants'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's always a compiler but not always a set of variants (for instance libszip has no variants)

That is true in the implied context you are working with I suppose: at the time module files are generated, the package would be installed and would have a compiler. That being said I think it's a bit unclear what is going on here and that implied context could easily be lost later on, so I'd prefer a more direct approach. In other words I'd prefer it if future readers and editors of this code didn't have to think about the context where Spec.format was being called from.

Nonetheless, I still want to evaluate the custom transformation functions because they may have side effects - like raising if an invalid token is used in the format string.

Are there other side effects you anticipate wanting? Generating an error is the only one I can think of. IMO it is better to make the effects explicit rather than depending on them to occur as side effects.

Suggestion: If the dictionary you pass only contained entries for tokens you are willing to write out, then this could detect a problem without having to invoke a function. I think that would be clearer and cover the only use case I'm currently aware of for spec properties that shouldn't be supported (for a particular invocation of Spec.format). You could also edit the method signature of Spec.format to optionally pass a customized error message, but actually IMO it would be more straightforward to throw an instance of PropertyNotSupported and catch it in the module file generator.

Furthermore, regarding a request I made previously:

I wouldn't require it of this PR, but it would be nice if it failed before calling functions like .use_name, like when the module writer was first initialized. Perhaps something simple like passing a non-concrete spec would work (for a package that doesn't actually exist).

If the dictionary only contained transform functions for supported Spec properties, then I think one could pass in an "empty" spec and detect issues with the format string before attempting to generate a module file name for a specific Spec. That's not necessarily motivating write now because the module file generators aren't instantiated until after a Package is installed, but if they were instantiated before package installations you could know there was an error with the format string before installing a package. To be clear: I'm not suggesting that the time when module file generators are instantiated should be changed here.

except AttributeError:
pass
x.name = str(x.name).replace('-', '_').upper()
x.name = str(x.name).replace('-', '_')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed here that module file generation may want to customize the extracted spec properties differently than other callers of Spec.format so I can see why it is useful to pass a dictionary of transformation functions. Originally I was confused why you didn't just pass in a set of Spec properties that should be supported. A dictionary of functions makes sense to me for this. That being said I mention elsewhere that I think this dictionary shouldn't contain functions for Spec properties you don't support (for a particular invocation of Spec.format).

closes spack#2884

In spack#1848 we decided to use `Spec.format` to expand certain tokens in
the module file naming scheme or in the environment variable name.
Anyhow not all the tokens that are allowed in `Spec.format` make sense
in these context, hence spack#2848.

This PR restricts the set of tokens that can be used, and adds tests to
check that the intended behavior is respected.
closes spack#4684

The names of environment variables set / modified by module files were,
up to now, always uppercase. There are packages though that require
case sensitive variable names to honor certain behaviors (e.g. OpenMPI).

This PR restrict the uppercase transformation in variable names only
to `Spec.format` tokens.
The previous implementation could lead to a behavior of `Spec.format`
that depended on the evaluation context. To avoid this, we have moved
all the checks on tokens to the `spack.modules` subpackage.

This commit also fixes a bug in the substitution of `${PACKAGE}` (where
`self.name` is used instead of `name`), and changes naming and type
of variables according to the review comments.
@alalazo alalazo force-pushed the fixes/token_substitution_and_case_sensitivity branch from 2956a78 to 02d2d80 Compare October 3, 2017 19:10
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 3, 2017

@scheibelp Le me know if the changes are fine, or if further modifications are needed.

@scheibelp
Copy link
Copy Markdown
Member

Le me know if the changes are fine, or if further modifications are needed.

Will do! I should have a chance to look at this by tomorrow.

@scheibelp scheibelp merged commit 3556eaa into spack:develop Oct 5, 2017
@scheibelp
Copy link
Copy Markdown
Member

Thanks for all the edits!

@alalazo alalazo deleted the fixes/token_substitution_and_case_sensitivity branch October 5, 2017 04:10
alalazo added a commit to epfl-scitas/spack that referenced this pull request Oct 19, 2017
closes spack#2884
closes spack#4684

In spack#1848 we decided to use `Spec.format` to expand certain tokens in
the module file naming scheme or in the environment variable name.
Not all the tokens that are allowed in `Spec.format` make sense in
module file generation. This PR restricts the set of tokens that can
be used, and adds tests to check that the intended behavior is respected.

Additionally, the names of environment variables set/modified by module
files were, up to now, always uppercase. There are packages though that
require case sensitive variable names to honor certain behaviors (e.g.
OpenMPI). This PR restricts the uppercase transformation in variable
names to `Spec.format` tokens.
Conflicts:
	lib/spack/spack/spec.py
alalazo added a commit to epfl-scitas/spack that referenced this pull request Oct 19, 2017
…) (#114)


In spack#1848 we decided to use `Spec.format` to expand certain tokens in
the module file naming scheme or in the environment variable name.
Not all the tokens that are allowed in `Spec.format` make sense in
module file generation. This PR restricts the set of tokens that can
be used, and adds tests to check that the intended behavior is respected.

Additionally, the names of environment variables set/modified by module
files were, up to now, always uppercase. There are packages though that
require case sensitive variable names to honor certain behaviors (e.g.
OpenMPI). This PR restricts the uppercase transformation in variable
names to `Spec.format` tokens.
Conflicts:
	lib/spack/spack/spec.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modules: Spelling of Environment Vars Module naming scheme OPTIONS and the emacs package

3 participants