Skip to content

[BUG] python->LAMMPS->python patterns broken on some interpreters. #3204

@lubbersnick

Description

@lubbersnick

Summary

When opening lammps using the python interface to the shared library, commands that invoke the interpreter from inside lammps can cause fatal errors.

LAMMPS Version and Platform

LAMMPS Versions: Probably many, but found in at least LAMMPS (24 Mar 2022), LAMMPS (29 Oct 2020) and LAMMPS (31 Aug 2021), LAMMPS (29 Aug 2021).
Operating system: MacOS, but this probably about the build of python rather than the OS per se.
Python: Various conda builds of python, including anaconda distributions as well as conda-forge distributions.

Expected Behavior

This feature should hopefully work?

Actual Behavior

The program crashes due to a failure to acquire references to the active python interpreter.

Depending on the python build, one may receive one of several error messages. I have omitted the normal output. It appears to fail when funcs.py intends to import the lammps python module.

Segfault:

<...>
LOOP ARGS 10 1.0 -4.0 <capsule object NULL at 0x10a2bb720>
Segmentation fault: 11

Fatal Python Error:

<...>
LOOP ARGS 10 1.0 -4.0 <capsule object NULL at 0x7fd8a813a780>
Fatal Python error: _PyInterpreterState_Get(): no current thread state
Python runtime state: initialized

Free without allocation error:

<...>
python(28375,0x1114f9e00) malloc: *** error for object 0x10422f080: pointer being freed was not allocated
python(28375,0x1114f9e00) malloc: *** set a breakpoint in malloc_error_break to debug
 *** Process received signal ***
 Signal: Abort trap: 6 (6)
 Signal code:  (0)
 [ 0] 0   libsystem_platform.dylib            0x00007fff20994d7d _sigtramp + 29
 [ 1] 0   ???                                 0x00007fc7ba70d830 0x0 + 140495803177008
 [ 2] 0   libsystem_c.dylib                   0x00007fff208a4406 abort + 125
 [ 3] 0   libsystem_malloc.dylib              0x00007fff20784165 has_default_zone0 + 0
 [ 4] 0   libsystem_malloc.dylib              0x00007fff207872aa malloc_report + 151
 [ 5] 0   python                              0x0000000103c8ccd3 _PyObject_Free + 115

Steps to Reproduce

The easiest way to reproduce the issue is to run the following script in the /examples/python directory of lammps:

import lammps
lmp=lammps.lammps()
lmp.file('in.python')

Further Information, Files, and Links

This error seems to occur because the python executable is compiled against its static library rather than its shared library. When python is the driving code, and LAMMPS attempts to check if the python interpreter is active, Py_IsInitialized() returns 0, even though it should return 1.

As such, the error can be detected -without- lammps at all:

import sysconfig
import ctypes
library = sysconfig.get_config_vars('INSTSONAME')[0]
pylib = ctypes.CDLL(library)
if not pylib.Py_IsInitialized():
    raise RuntimeError("This interpreter is not compatible with python->lammps->python controll flow..")

(On some builds INSTSONAME reports the wrong name for the python shared library, but this is a separate issue.)

I have also confirmed that from within lammps, Py_IsInitialized() returns 0,

You can likewise reproduce the segmentation fault in the interpreter by doing something nontrivial such as attempting to import a module:

>>> library.PyImport_ImportModule(b"collections")
Segmentation fault: 11

Note: within python itself, there is ctypes.pythonapi which -can- access the C-API of the running interpreter, so the above is not really a bug, just the wrong way of doing things.

My conclusion so far is that when loading the python shared library in a static build of python, the symbols in the shared library are distinct and not the ones used by the python interpreter. What we should want is for the calls in LAMMPS to call back into the active interpreter's symbol space.

  • This issue also affects the MLIAP package, and is the reason for the inclusion of the code here, but I want to be clear that this problem has nothing to do with the cython extensions used in MLIAP; the above shows that it is based on how the LAMMPS python module operates.

  • Python bindings should use -undefined dynamic_lookup when Py_ENABLE_SHARED is 0 stanford-centaur/pono#92 (comment) says that the sysconfig variable "Py_ENABLE_SHARED" can be used to detect a similar-sounding problem, however, I have a working python+LAMMPS on linux where Py_ENABLE_SHARED is zero, so I don't know if we can trust that.

  • Are you switching to dynamic linking? conda-forge/python-feedstock#222 (comment) says that maybe one can load the python executable as a shared library to get around a similar-sounding problem. I attempted this and did not find it successful - I could fool ctypes into thinking that the executable was a library, but it returned 0 for Py_IsInitialized().

  • In principle scikit-build has a function called target_link_libraries_with_dynamic_lookup that is possibly supposed to handle this kind of 'lazy load'. I attempted to see if is possible to patch this into lammps cmake but I get an error that I don't know how to fix. The test project needs language C which is not enabled.
    This is also associated with some scikit-build machinery which seems to be aimed at helping to embed python or build extensions.

  • I have some old conda environments laying around from late 3.6/early 3.7 that used to work and not crash, so it's not a fundamental problem with macos.

Some potential ways to address this

  1. If the LAMMPS python module was built as a c-extension from python, this problem will likely not occur - in this case references to the python API in the lammps module python do not come through the python shared library. While this would most likely work, it might significantly change the build process and involve refactoring how python talks to lammps. From my limited experience, it is not too difficult to wrap the calls in liblammps using cython and compile an extension module for a given set of LAMMPS compiler settings, however, automatic integration of the various compiler flags that LAMMPS uses (serial vs parallel, BIGINT size) to the setup.py file was nontrivial to me. While such a project could have other benefits, I find it highly unclear whether this would be worth the work -- so it's probably not the right place to start.

  2. Potentially, changes in how linking works: it's possible that there are some differences in the compilation flags for embedded python vs python extension modules that could be worked into our the LAMMPS build process. It could require building a separate shared compiled library to use in the lammps python module? I don't know. The scikit-build machinery or parts of it might help here.

  3. Leave it alone - decide that this use case is outside of the scope of LAMMPS, at least for now.
    3b) Raise an error or warning for the user ahead of crashing: One could add some state to the liblammps library which the python interpreter can set when opening lammps, when a command that requires the python interpreter runs, lammps can detect the flag is set and raise an error or warning instead of just dying - similar to the warning in lammps.mliap, but enforcing it in all scenarios where this can be a problem (through lammps/src/pythonimpl)

@rbberger Is this something you could take a look at?

@junghans Any idea if target_link_libraries_with_dynamic_lookup is something that might help?

Thanks to @jan-janssen and @athomps for helping to confirm.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions