Add NEB, ApproxNEB jobs / workflows#1007
Add NEB, ApproxNEB jobs / workflows#1007esoteric-ephemera merged 111 commits intomaterialsproject:mainfrom
Conversation
for clarity
|
@JaGeo should be ready to merge in. There are a few simple tests of the workflows in the PR. From more extensive tests (especially of the VASP Approx / NEB) in our group, these seem to be performant. Avoiding adding more for now since these are longer-running flows Over time, we may want to modify the document schema a bit (I've already found at least one spot where metadata could be improved) but I would lean towards merging in since these are only additions @davidwaroquiers I'll release a new version after merging |
gpetretto
left a comment
There was a problem hiding this comment.
Hi @esoteric-ephemera, thanks a lot for the great work implementing this.
I did not go through the code in details, but testing it I came across a couple of potential issues that I mentioned in the comments below.
As an additional point I wanted to check if I am getting right that at the moment there is no equivalent of the NebFromEndpointsMaker for the ASE NEB. It is easy to create such a flow manually, but it may be convenient to have something like this as well.
| "You must pip install `pymatgen-analysis-diffusion` " | ||
| "to generate images with IDPP." | ||
| ) from exc | ||
| return IDPPSolver.from_endpoints( |
There was a problem hiding this comment.
It seems that here an instance of IDPPSolver is returned. I think that the output of this function should be the output of the IDPPSolver.run() method.
There was a problem hiding this comment.
Good catch! Will add some logic for separating out the IDPPSolver.run and constructor kwargs
src/atomate2/common/jobs/neb.py
Outdated
| """ | ||
| if interpolation_method == NebInterpolation.LINEAR: | ||
| return endpoints[0].interpolate( | ||
| endpoints[1], nimages=num_images, **interpolation_kwargs |
There was a problem hiding this comment.
I have seen that the pymatgen interpolate method behaves in a somewhat weird way: nimages+1 structures are returned: https://github.com/materialsproject/pymatgen/blob/f9d9fe8e0ce09ef30cc03bcc4e9937d27afd5a6a/src/pymatgen/core/structure.py#L2443
So you get a set of structures which is not nimages in total nor nimages plus the initial and final structure. Which I found quite unexpected.
In addition this is at variance with what the IDPPSolver does. In that case a total of nimages+2 structures is returned.
So the meaning of nimages varies depending on the inteprolation method.
I would suggest passing num_images+1 to interpolate. This would maintain consistency among the methods and would also seem more reasonable.
|
@gpetretto thanks for reviewing and testing! |
|
Thanks @gpetretto! Have drafted up the ASE NEB from endpoints maker, still needs to be incorporated with the forcefields. I'll get back to this next week |
Thanks a lot for putting all of this together! |
gpetretto
left a comment
There was a problem hiding this comment.
Sorry if this comes as a second round of comments, but I have now tested also the VASP and ApproxNEB and encountered a few issues there as well.
See also the related PR on emmet: materialsproject/emmet#1255
src/atomate2/vasp/jobs/neb.py
Outdated
| }, | ||
| } | ||
| ) | ||
| lclimb: bool = True |
There was a problem hiding this comment.
Not sure if this is supposed to be propagated to the NebSetGenerator or if it is a leftover of a previous option, but it looks like it is not used anywhere.
There was a problem hiding this comment.
It was a holdover from earlier logic, thanks for the catch
src/atomate2/vasp/jobs/neb.py
Outdated
| } | ||
| ) | ||
| lclimb: bool = True | ||
| kpoints_kludge: Kpoints | None = None |
There was a problem hiding this comment.
it may be good to add the docstrings for this option, as its use is quite obscure without looking at the code.
There was a problem hiding this comment.
Good point, I'll move it into the main docstr (the explanation was buried in the maker) - will also change the default behavior to use this fix
| num_sites = len(images[0]) | ||
|
|
||
| tags = [os.getcwd()] | ||
| is_force_conv = all( |
There was a problem hiding this comment.
I did not try to do the math myself to see what the problem could be with this point, but in one of my tests I had an issue. The optimizer considered the forces converged and stopped the optimization after few loops, but this check was False and thus the task was marked as failed.
More in general, is there a need for this check specifically? optimize.run already returns a bool to specify if the optimization converged or not, why not using that?
There was a problem hiding this comment.
Probably has to do with the difference in NEB forces (interatomic + spring) vs plain interatomic forces. I'll take this out - I like the idea of using optimize.run to determine the task state, I'll update the other jobs as well
| run_vasp(**self.run_vasp_kwargs) | ||
|
|
||
| # parse vasp outputs | ||
| task_doc = get_vasp_task_document( |
There was a problem hiding this comment.
This is a generic comment on the NebFromImagesMaker, but I wanted to mention that I found somewhat confusing the fact that this job returns a NebTaskDoc, because by construction this will only contain the energies of the intermediate images and not those of the endpoints. As a consequence all the barrier analysis and values reported in the output document are wrong.
Even assuming that this is always used inside the NebFromEndpointsMaker the output database will contain two NebTaskDoc coming from the same flow, one of which is incomplete.
I am afraid I don't have a much better way of proceeding to propose. One potential suggestion would be that the output of this job should only be a minimal document with the list of folders and some information about the vasp calculation, relying on the fact that a collect_neb_output Job will be run as a subsequent step.
There was a problem hiding this comment.
Hmm, maybe there should be an NebIntermediateImageResult class that has this minimal info. Wouldn't need to be in emmet since this is just for book-keeping on the atomate2 side
| "PREC": "Normal", | ||
| "NSW": 99, | ||
| "LCHARG": False, | ||
| "IBRION": 2, |
There was a problem hiding this comment.
I don't know if it was just because my example was too trivial, but with the default EDIFF the forces were not converging even for a large number of steps, probably due to the SCF not being properly converged. It instead converged was quite fast as soon as I lowered EDIFF. I don't have many test cases, but if you also encountered this issue it may be worth considering using a lower value as a default.
There was a problem hiding this comment.
Always hard to say with NEB but EDIFF = 1e-6 or 1e-7 might be a safer default
| ) | ||
| run_vasp_kwargs: dict = field( | ||
| default_factory=lambda: { | ||
| "job_type": JobType.DOUBLE_RELAXATION, |
There was a problem hiding this comment.
Maybe I am missing something but it seems that with set_type="image" the INCAR containts ISIF=2 and thus volume is not changing. So what is the point of the double relaxation here?
There was a problem hiding this comment.
This is mostly for continuity / reproducibility of the original atomate version. Might be good to have a legacy / current split of the input sets, just like we do for the EOS workflows
| # compatibility with legacy input (list) | ||
| if isinstance(inserted_coords_dict, list): | ||
| inserted_coords_dict = dict(enumerate(inserted_coords_dict)) | ||
|
|
There was a problem hiding this comment.
It would be good to have a check here to verify that inserted_coords_dict and inserted_coords_combo are consistent. I had a typo in my inserted_coords_combo and this caused an error only late in the Flow execution, with no obvious link to the cause of the error.
| the mobile species in ApproxNEB | ||
| inserted_coords_dict: dict or list | ||
| a dictionary containing site coords (endpoints) for working ions | ||
| in the simulation cell |
There was a problem hiding this comment.
Better specify that these should be fractional coordinates. I had to look at the code to know if they had to be fractional or cartesian.
| """ | ||
| ep_jobs: list[Job] = [] | ||
| ep_output: dict[str, dict[str, Any]] = {} | ||
| for idx, ep in enumerate(end_point_structures): |
There was a problem hiding this comment.
It is mentioned in the docstrings, but I would add a check that only two structures are actually passed in the end_point_structures.
In principle I suppose there would be no particular issue implementing this Flow accepting more than 2 endpoints and calculating all the subsequent jumps. Before checking the docstrings I was expecting it would have been possible to pass more than 2.
| ep_relax_output = {} | ||
| ep_relax_jobs = [] | ||
| for ep_index, ep_coords in endpoint_coords.items(): | ||
| if int(ep_index) in ep_distinct: |
There was a problem hiding this comment.
Maybe it is redundant with the other check I am suggesting in CommonApproxNebMaker, but in case this Job ends up being used outside that flow it may be good to make a check here as well. In this loop if one of the keys in ep_distinct is not endpoint_coords the code will just pass without creating the relaxation job for that endpoint, but fail later on.
There was a problem hiding this comment.
Having both is a good fallback
| name: str = "Forcefield NEB from images" | ||
| force_field_name: str | MLFF = MLFF.Forcefield | ||
|
|
||
| @job(data=_FORCEFIELD_DATA_OBJECTS, schema=NebResult) |
There was a problem hiding this comment.
Sorry, I just realized that here and in all the other jobs the schema is set through the schema argument, but it should be output_schema. Jobflow does not raise an error since a job accepts kwargs.
|
@gpetretto @JaGeo I think I've addressed all comments - any objections to merging? |
|
Also @fraricci I've added the checks you suggested for NPTBerendsen, plus a test for correctly setting the MD kwargs |
|
@JaGeo noticed I had some typos in the MACE + explicit D3 calculator, fixed those, added a better test, and am going to merge this PR |
|
Sound good @esoteric-ephemera ! |
Summary
Adding NEB workflows and schemas for VASP / ASE / MLFF. This hopefully brings atomate2 to full parity with the feature set of atomate
@hmlli is leading the port of ApproxNEB flows for VASP
Very open to suggestions, particularly for document schema structure
Key updates:
Miscellaneous