The problem
The design of libGL drivers is such that the userspace part of the driver consists of a libGL.so that gets loaded in each process using OpenGL. That is, each driver vendor (Mesa, NVIDIA, AMD) ships their own libGL.so that we select dynamically (and impurely) by having NixOS set:
LD_LIBRARY_PATH=/run/opengl-driver/lib:/run/opengl-driver-32/lib
and having the NixOS module set the symlinks pointing to the proper packages depending on the system configuration. Now while the OpenGL ABI itself is stable, a major pain point for us that the impurity causes are conflicting library versions between any libraries that the driver itself and the application depends on.
Issue #16779 shows a manifestation of this problem: applications built on NixOS 16.03 would stop working on NixOS 16.09, because of a version conflict between libwayland.so used both by the application and Mesa: the application itself causes version X of libwayland.so being loaded to the process, but Mesa requires version Y of libwayland.so being loaded, thus the application cannot start up and fails with:
Note that this problem is not inherently specific to NixOS -- the same problem is known to happen on other distros as well when the libstdc++ version provided by the Steam runtime conflicts with the libstdc++ that Mesa requires.
A (potential) solution
An attempt of solving this has been done in the libcapsule project (https://git.collabora.com/cgit/user/vivek/libcapsule.git/tree/README) by a Collabora employee. The approach taken there is to build a stub libGL.so that uses the little-known dlmopen() function to create a completely new symbol namespace for dynamic linking, and load the real libGL.so of the graphics driver there, and then redirect all exported symbols from the stub libGL.so to the entry points in the real libGL.so living in the segregated dynamic linker namespace. This is implemented via a clever hack of patching the PLT table of the stub libGL.so to point to the real libGL.so's entry points, so there is zero overhead for function calls to libGL!
The problems in practice
I attempted to package and use libcapsule during NixCon 2017, with not-so-great success (https://github.com/dezgeg/libcapsule, https://github.com/dezgeg/nixpkgs/tree/libcapsule). While approach taken by libcapsule seems theoretically sound, one problem seems to be that the proxied libGL driver needs to also provide exports for libX11.so among some other xcb libraries. I'm not totally sure why, but I'm guessing the X11 client driver keeps some per-process state on which GLX client-side library is associated with which X screen, so having two different libX11.so's in the main symbol namespace and inside the capsuled symbol namespace would break things.
Now, that causes a problem because libraries like libXi (probably accidentally) allocate memory with malloc() from outside the capsule but free it with XFree(), which crashes because XFree() calls the free() inside the capsule, and those two glibc of course have their independent heaps. AFAICT, there's currently no way to have certain libraries loaded only once and shared by both the main dlmopen() namespace and the in-capsule dlmopen() namespace.
A potential way to avoid that problem might be to try would be to use libcapsule between libglvnd and and the driver, which shouldn't require the hack of exporting symbols from libX11. Though what worries me a bit is whether having multiple glibcs loaded in will work either, given that there are some per-process and per-thread kernel APIs where the two glibcs might step on each others' feet. (set_robust_list() and sbrk() come to mind). But presumably the glibc people have given at least some thought to that, or the entire dlmopen() would become pretty much useless...
cc: @vcunat, @abbradar
The problem
The design of libGL drivers is such that the userspace part of the driver consists of a libGL.so that gets loaded in each process using OpenGL. That is, each driver vendor (Mesa, NVIDIA, AMD) ships their own libGL.so that we select dynamically (and impurely) by having NixOS set:
and having the NixOS module set the symlinks pointing to the proper packages depending on the system configuration. Now while the OpenGL ABI itself is stable, a major pain point for us that the impurity causes are conflicting library versions between any libraries that the driver itself and the application depends on.
Issue #16779 shows a manifestation of this problem: applications built on NixOS 16.03 would stop working on NixOS 16.09, because of a version conflict between libwayland.so used both by the application and Mesa: the application itself causes version X of libwayland.so being loaded to the process, but Mesa requires version Y of libwayland.so being loaded, thus the application cannot start up and fails with:
Note that this problem is not inherently specific to NixOS -- the same problem is known to happen on other distros as well when the libstdc++ version provided by the Steam runtime conflicts with the libstdc++ that Mesa requires.
A (potential) solution
An attempt of solving this has been done in the libcapsule project (https://git.collabora.com/cgit/user/vivek/libcapsule.git/tree/README) by a Collabora employee. The approach taken there is to build a stub libGL.so that uses the little-known
dlmopen()function to create a completely new symbol namespace for dynamic linking, and load the real libGL.so of the graphics driver there, and then redirect all exported symbols from the stub libGL.so to the entry points in the real libGL.so living in the segregated dynamic linker namespace. This is implemented via a clever hack of patching the PLT table of the stub libGL.so to point to the real libGL.so's entry points, so there is zero overhead for function calls to libGL!The problems in practice
I attempted to package and use libcapsule during NixCon 2017, with not-so-great success (https://github.com/dezgeg/libcapsule, https://github.com/dezgeg/nixpkgs/tree/libcapsule). While approach taken by libcapsule seems theoretically sound, one problem seems to be that the proxied libGL driver needs to also provide exports for libX11.so among some other xcb libraries. I'm not totally sure why, but I'm guessing the X11 client driver keeps some per-process state on which GLX client-side library is associated with which X screen, so having two different libX11.so's in the main symbol namespace and inside the capsuled symbol namespace would break things.
Now, that causes a problem because libraries like libXi (probably accidentally) allocate memory with
malloc()from outside the capsule but free it withXFree(), which crashes becauseXFree()calls thefree()inside the capsule, and those two glibc of course have their independent heaps. AFAICT, there's currently no way to have certain libraries loaded only once and shared by both the maindlmopen()namespace and the in-capsuledlmopen()namespace.A potential way to avoid that problem might be to try would be to use libcapsule between libglvnd and and the driver, which shouldn't require the hack of exporting symbols from libX11. Though what worries me a bit is whether having multiple glibcs loaded in will work either, given that there are some per-process and per-thread kernel APIs where the two glibcs might step on each others' feet. (
set_robust_list()andsbrk()come to mind). But presumably the glibc people have given at least some thought to that, or the entiredlmopen()would become pretty much useless...cc: @vcunat, @abbradar