Comment:
The conda-forge docs for the microarch-optimized builds have an example that uses microarch_level: 4. But the README for this feedstock contains the following caveat:
When building packages on CI, level=4 will not be guaranteed, so you can only use level<=3 to build.
Indeed, when I tried to use level 4, I saw failures (in my case, it was on osx).
Nonetheless, I'd like to produce optimized builds for machines that support AVX-512 (level 4). This was possible by explicitly adding the necessary build flag in build.sh and then explicitly listing the appropriate run dependency:
# conda_build_config.yaml
microarch_level:
- 1
- 3 # [unix and x86_64]
- 4 # [unix and x86_64]
# build.sh
if [[ "${microarch_level}" == "4" ]]; then
CXXFLAGS="${CXXFLAGS} -march=x86-64-v4"
fi
# meta.yaml
requirements:
run:
- _x86_64-microarch-level 4 # [unix and x86_64 and microarch_level == 4]
Using that workaround, we were able to produce optimized binaries (including march=x86-64-v4 in the graph-tool feedstock (conda-forge/graph-tool-feedstock#140).
Would it be possible to make that easier for feedstock maintainers, perhaps by having the microarch-level-feedstock produce yet another output?
Right now this feedstock produces two packages for each arch, such as:
x86_64-microarch-level
a. Introduces the -march=x86-64-v${level} flag in CFLAGS etc.
b. Introduces a run_export to _x86_64-microarch-level
_x86_64-microarch-level
a. Introduces a run dependency to the appropriate __archspec virtual package.
...but it seems like cross-compilation would be easier if we were to split up the functionality from 1.a and 1.b. into two separate packages, so we could easily obtain the correct CFLAGS without pulling in the __archspec dependency. Perhaps we could offer two variants of the package: one that provides both 1.a and 1.b, and another variant that only provides 1.a. (I'm just splitballing here...)
Alternatively, we could just drop the run_exports from the {{ family }}-microarch-level recipe. In that case, feedstock maintainers could build level-4 packages without needing to add the compiler flag explicitly, but they would be forced to explicitly list the appropriate runtime dependency in their recipe, which could be annoying:
requirements:
build:
- x86_64-microarch-level {{ microarch_level }} # [unix and x86_64]
- ppc64le-microarch-level {{ microarch_level }} # [unix and ppc64le]
run:
- _x86_64-microarch-level >={{ microarch_level }} # [unix and x86_64]
- _ppc64le-microarch-level >={{ microarch_level }} # [unix and ppc64le]
Comment:
The conda-forge docs for the microarch-optimized builds have an example that uses
microarch_level: 4. But the README for this feedstock contains the following caveat:Indeed, when I tried to use level 4, I saw failures (in my case, it was on osx).
Nonetheless, I'd like to produce optimized builds for machines that support AVX-512 (level 4). This was possible by explicitly adding the necessary build flag in
build.shand then explicitly listing the appropriaterundependency:Using that workaround, we were able to produce optimized binaries (including
march=x86-64-v4in thegraph-toolfeedstock (conda-forge/graph-tool-feedstock#140).Would it be possible to make that easier for feedstock maintainers, perhaps by having the
microarch-level-feedstockproduce yet another output?Right now this feedstock produces two packages for each arch, such as:
x86_64-microarch-levela. Introduces the
-march=x86-64-v${level}flag inCFLAGSetc.b. Introduces a
run_exportto_x86_64-microarch-level_x86_64-microarch-levela. Introduces a
rundependency to the appropriate__archspecvirtual package....but it seems like cross-compilation would be easier if we were to split up the functionality from 1.a and 1.b. into two separate packages, so we could easily obtain the correct
CFLAGSwithout pulling in the__archspecdependency. Perhaps we could offer two variants of the package: one that provides both 1.a and 1.b, and another variant that only provides 1.a. (I'm just splitballing here...)Alternatively, we could just drop the
run_exportsfrom the{{ family }}-microarch-levelrecipe. In that case, feedstock maintainers could build level-4 packages without needing to add the compiler flag explicitly, but they would be forced to explicitly list the appropriate runtime dependency in their recipe, which could be annoying: