Skip to content

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

@alamb

Description

@alamb

This ticket tracks adding a profile guided optimization to the documentation section and link to #9507

Many thanks to @@zamazan4ik for this wonderful content

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it?

Yes, it would be a great option. It requires almost no resources to maintain (write once and link to this discussion for the results). In this case, users who are interested in optimizing arrow-datafusion more will be able to use this information as an additional optimization opportunity. I have several examples of how such documentation can be written (it's for applications but anyway - for a library case it should look a similar way):

Provide pre-gathered PGO data somehow, so users could build DataFusion with profiles guided from TPCH (or clickbench).

Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO profiles have multiple issues - e.g. incompatibilities between different compiler versions, a profile skew (when a PGO profile is gathered for an older version of the code. When time flies, pre-gathered PGO profiles become less and less efficient so some kind of regular PGO profile regeneration is required).

I could suggest another similar way - integrate into the build scripts the way to build the library with enabled PGO (based on some workload like TPCH, Clickbench, any other target workload, or any combination of them - it's up to discussion). On the one hand, users will be able to build the PGO-optimized version of the library. On another hand, you won't waste your maintenance resources on maintaining always up-to-date pre-gathered PGO profiles (however, this process can be simplified with CI).

Some examples of PGO build integration into the build scripts:

If you have some prebuilt versions of the library (e.g. a Python wheel), you can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, Clickbench, etc.). As an example - Pydantic-core: GitHub PR.

Originally posted by @zamazan4ik in #9507 (reply in thread)

Metadata

Metadata

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions