This project will no longer be maintained by Intel.
Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates.
Patches to this project are no longer accepted by Intel.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.
thin-layout-optimizer is a project that takes a performance profile from a
tool like perf and uses the data (particularly lbr data) to the
layout of the profiled DSO(s) functions.
Currently it uses a basic hfsort algorithm (or other re-ordering
algorithms) to do this. The output is a text file with some
information from the profiles (including the order), which is meant to
be consumed by the script
finalize-order.py to create a function order list for either ld or
gold.
-
mkdir -p build && (cd build && cmake -GNinja .. && ninja)Note: There are soft dependencies on
zstdandgtest. Withoutzstdthere is no support for reading compressed profiles. Withoutgtestthe tests don't work.cmakewill try to find the respective system packages for the two dependencies above. Alternatively you can manually set this themZSTD_PATHandGTEST_PATHrespectively.For example:
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DZSTD_PATH=/home/noah/programs/libraries/zstd/build-std-flto
ninja check-all
-
ninja san- This will produce builds for
asan,usan,lsan, and (if compiling withclang)msan. Note: Sanitized tests can be run withninja check-all-san(orcheck-all-allto also run non-sanitized tests).
Note: To support compression with
msan, you will need to provide a path tozstdbuilt withmsan. For example:cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DZSTD_PATH=/home/noah/programs/libraries/zstd/build-std-flto -DZSTD_MSAN_PATH=/home/noah/programs/libraries/zstd/build-msan16 - This will produce builds for
- Since
lddoes not naturally have support for passing an independent function-ordering file, we require patches told. The patches add a new option told,--text-section-ordering-filewhich accepts an ordering file. This can be difficult to generically add to existing link steps as they might rely on previously existing linker scripts. To simplify this process we have a patch forld.2.41(most recent release as of writing this) to read a linker script from an env variable;LD_ORDERING_SCRIPT. The patch makes is so that if no linker script argument was provided, it will check ifLD_ORDERING_SCRIPTcontains a valid script. Furthermore, as it can be difficult to juggle the right env variable for multiple DSOs, there is also an env variableLD_ORDERING_SCRIPT_MAP. This variable acts a map from DSO -> ordering script. When linking, if there is no commandline or explicit env linker script provided, it will lookup the output DSO in the file the env variableLD_ORDERING_SCRIPT_MAPit set to. If it find the output DSO, it will use its corresponding argument as the file for the linker script.
-
Patches:
git am patches/ld/*.patch
-
Usage(
LD_ORDERING_SCRIPT):export LD_ORDERING_SCRIPT=ordering.lds; ld ...
NOTE If the
.ldslinker script file exists, it must be a syntactically valid linker script, otherwiseldwill fail. If the fail doesn't exist it will be ignored. -
Usage(
LD_ORDERING_SCRIPT_MAP): Map syntax:target:<target_name0>,<target_name1>,...<target_nameN> DSO0 <ordering-script0> DSO1 <ordering-script1> ... DSON <ordering-scriptN>export LD_ORDERING_SCRIPT_MAP=ordering_map; ld ...
Note the target name is optional. If it is present, we will only using the ordering script map if the target string matches the link target for
ld/gold. If it is not present, we will always use the map. The target name is the name used for the emulation (-m) option ofld(even if usinggold, specify theldemulation name). For example to make a map file for onlyx86_64specifytarget:elf_x86_64before any DSOs.Note: Prior to reaching a
target:line, we will watch DSOs to any target. This also means that iftarget:is not present, we will match all the DSOs regardless of target. -
Unpatched
ld: It is possible to use thefinalize-order.pyscript to generate anldlinker script using theld.origtarget. The generated linker script will have the ordering embedded in the best guess of the default linker script. While you may have some success with this, it is not advised as someldoptions may essentially end up overriden by overriding the default linker script.
- Equivilent patches exist for
goldwhich allows the section ordering script to be passed through theGOLD_ORDERING_SCRIPTorGOLD_ORDERING_SCRIPT_MAPenv variables. That being said,goldnatively supports a section ordering file with the--section-ordering-fileargument.
-
Usage Without Patches:
gold --section-ordering-file ordering.txt ...
NOTE The
goldlinker tends to do a slightly better job thanld.
-
Setup patches:
-
patches/ld/*.patch -
Usage(
GOLD_ORDERING_SCRIPT) With Patches:export GOLD_ORDERING_SCRIPT=ordering.gold; gold ...
-
Usage(
GOLD_ORDERING_SCRIPT_MAP) With Patches:export GOLD_ORDERING_SCRIPT_MAP=ordering_map; gold ...
-
Collect a profile of the system using.
perf record -e cycles:u,branch-misses -j any,u -a
-
Package the result. This will copy all the referenced DSOs + some debug file to a new
tarfile.thin-layout-optimizerwill use the copied DSOs as its references (as opposed to the system ones which, if updated, will changed the addresses of functions and make the old data unusable).python3 scripts/package.py <input:perf.data file> <dst:packaged-profile>
<dst>will always be a.tar.gzfile, even if you didn't specify.tar.gz -
Unpackage the results when you are ready to create the function ordering(s).
python3 scripts/unpackage.py <src:packaged-profile.tar.gz> <dst:unpackaged-profile dir>
This is really just a wrapped for untarring the
tarcreated in Step 2.Note: Both the
package.pyandunpackage.pyscripts accepts a 3rd optional argumetcompress. If this is provided it will runperfon theperf.dataprofile and compress the output (withzstd). Once you have the compressedzstfiles, it is fine to delete theperf.datafile. -
Run
thin-layout-optimizer.thin-layout-optimizer -r <src:unpackaged-profile dir> -o <dst:dir-for-ordering-file> --save <dst:saved-state-file>
Alternatively, if you just want to save a profile to combine with other future profiles, or just to create ordering files later you can use:
Note: There are more options (see
thin-layout-optimizer -h). For the most part just using the above command should be all you need to do. -
(Optional) Save/Reload From Saved States.
- When running
thin-layout-optimizerwith a newperf.dataprofile, you can use the option--saveto store the state just before call-graph creation. After creating a save-state, you can re-runthin-layout-optimizerusing the--reloadoption to avoid the time-consuming task of processing theperf.datafiles. You can also combine multiple save-states with the option. For example:
-
Saving a profile:
thin-layout-optimizer -r <src:unpackaged-profile dir> --save <dst:saved-state-file>
-
Reloading save state
thin-layout-optimizer --reload <src:saved-state-file> -o <dst:dir-for-ordering-file>
-
Reloading and combining multiple saved state
thin-layout-optimizer --reload <src:saved-state-file0>,<src:saved-state-file1>,<src:saved-state-fileN> --save <dst:combined-save-state-file>
- When running
-
Finalize ordering for a target.
Note: This section assumes usage of the
ld/goldpatches with the*_ORDERING_SCRIPT_MAPenv variable.-
Once a set of ordering files have been created the final step is to create the linker scripts to be used for re-linking. Note the ordering files can either be from a
perf.datafile, save states, or a combination of the two. -
The method for creating the
*_ORDERING_SCRIPT_MAPand linker scripts from the ordering files is to use thescripts/finalize-order.pyscript. -
python3 scripts/finalize-order.py -i <src:dir-with-ordering-files> -t <'ld', 'ld.orig', or 'gold'> -o <dst:directory for linker scripts> -m <dst:map files for env variable> -p <perfix for paths in map file> --align-hot-n <float:0-100> --align-till <float:0-100> --alignment=<int:0-12> --align-per-dso --aliases <file:alias-file>- The
-iargument is the directory where you saved the ordering files (the-oargument fromthin-layout-optimizer) - The
-margument will create the file you should set to*_ORDERING_SCRIPT_MAP. - The
-oargument will be where the actual linker scripts are stored. - The
-pargument will set prefix for where the directory with the scripts should be searched for by the*_ORDERING_SCRIPT_MAP. I.e if you use-p ~/foobar, the*_ORDERING_SCRIPT_MAPwill contain entries like<dso name> ~/foobar/<linker script for dso name>. If the-poption isn't present it will default to what was used for-o. - The
--alignmentoption specifies the log2 of the alignment we want. I.e--alignment=5(default) will align functions to 32 bytes.--alignment=10will likewise be 1024 byte alignment. - The
--align-hot-noption species we want to add--alignmentto the top N percentage of called functions. I.e if--align-hot-n=1and--alignment=5, we will align the functions that are in the 99% percentile of number of incoming calls to 32 bytes. - The
--align-tilloption species we want to add--alignmentto functions until we account of N percentage of total calls. I.e if--align-till=33and--alignment=12, we will keep aligning functions to 4096 bytes until 33% of the total incoming edges have been accounted for. - The
--align-per-dsooption takes the above two constraints (--align-hot-nand--align-till), and instead of applying them globally, will apply them for each DSO. For example if we have two DSOsAandBwithAcontaining 99% of the total calls. If you don't specify--align-per-dso, then we will align functions in bothA/Bbased on their global hotness (so likely no function fromB). If you do specify--align-per-dso, we will apply both alignment constraints to bothAandBregardless of their relative weight. - The
--aliasesoption takes a file that maps one target to another. This is useful if the commandline profiled is different than the build target, or if you want to use the same ordering file for multiple targets. Note if the--aliasargument is missing, it will try to find one at the path specified by the env variableORDERING_SCRIPT_ALIASES.
- The
Note: An example from running this script when targeting llvm:
-
For
gold(with an explicit prefix):python3 scripts/finalize-order.py -i orders/ -t gold -o llvm-orders/ -m llvm-orders-map.gold -p ~/.llvm-orders
-
For
ld(without an explicit prefix):python3 scripts/finalize-order.py -i orders/ -t ld -o ~/.llvm-orders/ -m llvm-orders-map.ld
By default the script will name the actual script files with a
.ldor.goldextension based on the target, so scripts forld/goldcan safely be stored to the same directory.NOTE: By default finalize order will emit lines to find each function at:
.text.<func>.text.hot.<func>.text.cold.<func>.text.unlikely.<func>
To skip the
coldandunlikelylocations, pass--no-coldtofinalize-order.py. -
-
Rebuild the project.
- For
ldeither pass the outputorder-script.ldstolddirectly with the-Toption, or if using the patched version set envexport LD_ORDERING_SCRIPT=/path/to/order-script.lds - For
goldyou can use the builtin argument--section-ordering-file.
- For
-
Verify new order.
-
To verify a new order, you can use
scripts/compare-order.py. This will compare the ordering in a built binary, compare it with an ordering file, and produce a "score" for how close the binary and ordering file are. The lower the score, the closer the orders are.. Note the ordering file referenced here is not the output script used in the link stage, but the assosiated file from Step 4 in directory<dir-for-ordering-file>.python3 scripts/compare-order.py <src:ordering file> <src:bin0> ... <src:binN>I.e when using for LLVM:
python3 scripts/compare-order.py orders-new-new/ordering--home-noah-programs-opensource-llvm-dev-src-project3-build-ld-bin-clang-18.txt /home/noah/programs/opensource/llvm-dev/src/project3/build-gold-order/bin/clang-18 /home/noah/programs/opensource/llvm-dev/src/project3/build-ld-order/bin/clang-18 /home/noah/programs/opensource/llvm-dev/src/project3/build-ld/bin/clang-18 /home/noah/programs/opensource/llvm-dev/src/project3/build-gold-order/bin/clang-18 -> 12.72/12.72 /home/noah/programs/opensource/llvm-dev/src/project3/build-ld-order/bin/clang-18 -> 68.423/68.423 /home/noah/programs/opensource/llvm-dev/src/project3/build-no-order/bin/clang-18 -> 16963.985/16963.985Note: Generally there will be no difference between the two numbers, if there is its an indication that the profile was too minimal.
-
To get an idea if new linker scripts are needed, based on new
benchmark results, you can use the `compare-dir.py` script. This
script will see how different the function orders in two
directories are. Then, based on the result, you can decide if the
results warrant creating new linker scripts.
-
Get Orders for Benchmarks:
thin-layout-optimizer -r <src:unpackaged-profile dir benchmark0> -o <dst:dir-for-ordering-files0> --save <dst:saved-state-file0>thin-layout-optimizer -r <src:unpackaged-profile dir benchmark1> -o <dst:dir-for-ordering-files1> --save <dst:saved-state-file1>
-
Compare Output Directories:
python3 scripts/compare-dir.py <dst:dir-for-ordering-files0> <dst:dir-for-ordering-files1>
This will the difference of all common and distinct DSOs in both order directores. As well as a summary at the end.
-
Advanced DSO Matching: By default
compare-dir.pywill match.sofiles across different versions (i.e matchfoo.so.1.1withfoo.so.1.2). To disable this pass--exact-version. -
Common DSOs Only: To only compare the common DSOs (skip the summary of DSOs that are not in both directories), use the
--ignore-distinctoption.
By default, you reload multiple save-states the save states are
all normalized. This ensure that longer running/shorter running
(but equally important) benchmarks are both weighted the same when
recombined.
-
Normalization: Normalization is on by default. To disable to you can use either
--no-normalizeor--force-no-normalize. The only difference between these two options, is that--no-normalizewill fail if the reloaded save states have a mixed of already normalized / not-normalized files (its pretty easy to see why this is likely not desirable). The--force-no-normalizeoption will stil warn in this case, but won't kill the process. An example use might be:thin-layout-optimizer --no-normalize --reload <src:saved-state-file0>,<src:saved-state-file1>,<src:saved-state-fileN> --save <dst:combined-save-state-file> -
Scaling: It may be desirable to scale certain benchmarks. I.e there may be a set of very important benchmarks, but also some less important ones that still provide meaningful coverage. To scale certain benchmarks (increase / decrease there importance) you can either use the
--add-scaleoption when creating a save state, or manually edit a save state.-
Adding a Scale By Hand: Underneath the hood, the save states a just a json file. To manually add a save state you can insert entries under the
"scaling"field:"scaling": { "scale": <Some Double above 0.0> }Its possible the
"scaling"field won't already exist in the save state. If thats the case, just add it.You might also see other options in the
"scaling"field. For example:"scaling": { "edge_normalized": true, "func_normalized": true, "edge_scale" : 2.5, "func_scale" : 2.0 },The
"edge_normalized"and"func_normalized"fields should never be modified by hand. These are for tracking which files have been normalized.The
"edge_scale"and"func_scale"options are equivilent to"scale"but with a bit more precision as to what they apply to. To scale only weights among edges, but not function sample rate, or vice versa, you can modify/insert these fields independently. If the"scale"field exists, its value will override both"func_scale"and"edge_scale". For example with:"scaling": { "edge_normalized": true, "func_normalized": true, "edge_scale" : 2.5, "func_scale" : 2.0 "scale" : 3.0 },Both function samples and edge samples will be scaled by 3.0.
-
Adding a Scale With The Commandline: To add a scale to a single file, instead of modify by hand you can also do:
thin-layout-optimizer --no-normalize --add-scale <Some Double Above 0.0> --reload <src:saved-state-file0> --save <dst:saved-state-file0-with-assosiated-scale>The
--no-normalizeis not strictly necessary, but will preserve the original values.Likewise, if you wish to combine multiple save states and assosiate a scale with the combined save state you can also do:
thin-layout-optimizer --add-scale <Some Double Above 0.0> --reload <src:saved-state-file0>,<src:saved-state-file1>,<src:saved-state-fileN> --save <dst:saved-state-file0-N-with-assosiated-scale>Whether you normalize is optional again, just keep in mind once a save state has been normalized, its original precise values will have been lost.
Using a Scale: To actually scale a file, use the
--use-custom-scalecommandline option.thin-layout-optimizer --no-normalize --reload --save <dst:saved-state-file0-with-assosiated-scale> --save <dst:saved-state-file0-scaled>The `--use-custom-scale` option will check whats in the `"scaling"` field and use any `"scale"`, `"func_scale"`, and/or `"edge_scale"` values to multiply the relevant metrics. **NOTE: You <u>cannot</u> use both `--add-scale` and `--use-custom-scale` at the same time. `--add-scale` only assosiates a scale with the save state. It does not actually multiply the fields.**. Finally, you can invoke `--use-custom-scale` when recombining multiple save states: `thin-layout-optimizer --use-custom-scale --reload <src:saved-state-file0>,<src:saved-state-file1>,<src:saved-state-fileN> --save <dst:saved-state-file0-N-scaled>` When combining multiple save states with `--use-custom-scale`, the scale will be ontop of normalization. As well, each save-state will use its own `"scaling"` field. -