-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
This ticket tracks adding a profile guided optimization to the documentation section and link to #9507
Many thanks to @@zamazan4ik for this wonderful content
Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it?
Yes, it would be a great option. It requires almost no resources to maintain (write once and link to this discussion for the results). In this case, users who are interested in optimizing arrow-datafusion more will be able to use this information as an additional optimization opportunity. I have several examples of how such documentation can be written (it's for applications but anyway - for a library case it should look a similar way):
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- Clang:
- Rustc: https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
- tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md
Provide pre-gathered PGO data somehow, so users could build DataFusion with profiles guided from TPCH (or clickbench).
Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO profiles have multiple issues - e.g. incompatibilities between different compiler versions, a profile skew (when a PGO profile is gathered for an older version of the code. When time flies, pre-gathered PGO profiles become less and less efficient so some kind of regular PGO profile regeneration is required).
I could suggest another similar way - integrate into the build scripts the way to build the library with enabled PGO (based on some workload like TPCH, Clickbench, any other target workload, or any combination of them - it's up to discussion). On the one hand, users will be able to build the PGO-optimized version of the library. On another hand, you won't waste your maintenance resources on maintaining always up-to-date pre-gathered PGO profiles (however, this process can be simplified with CI).
Some examples of PGO build integration into the build scripts:
- Rustc: a CI tool for the multi-stage build
- GCC:
- Clang:
- Python:
- Go: Bash script
- Swift: CMake script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
- ISPC: CMake scipts
- NodeJS: Configure script
- Android Open Source Project (AOSP):
- Official documentation
- Committed PGO profiles: repository
- DMD: Custom build rule
- LDC: GitHub action
- tsv-utils: Makefile
- Erlang OTP: Makefile
- Clingo (PGO enabled only in Spack): Package recipe
- SWI-Prolog:
- hck: Justfile
If you have some prebuilt versions of the library (e.g. a Python wheel), you can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, Clickbench, etc.). As an example - Pydantic-core: GitHub PR.
Originally posted by @zamazan4ik in #9507 (reply in thread)