Unify handling of stats in the CPython VM #90230

markshannon · 2021-12-14T16:17:54Z

BPO	46072
Nosy	@tiran, @markshannon, @brandtbucher
PRs	#30116 #30139 #30145 #30169 #30989 #31051 #31060 #31079 #31104 #31105 #31108 #31197 #31211 #31228 #31234 #31250 #31251 #31254 #31259 #31289 #31324

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-12-14.16:17:53.632>
labels = []
title = 'Unify handling of stats in the CPython VM'
updated_at = <Date 2022-02-16.16:50:02.905>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-02-16.16:50:02.905>
actor = 'brandtbucher'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-12-14.16:17:53.632>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 46072
keywords = ['patch']
message_count = 14.0
messages = ['408543', '408613', '408618', '408702', '408704', '408708', '408787', '412002', '412271', '412350', '412750', '412769', '412901', '413341']
nosy_count = 3.0
nosy_names = ['christian.heimes', 'Mark.Shannon', 'brandtbucher']
pr_nums = ['30116', '30139', '30145', '30169', '30989', '31051', '31060', '31079', '31104', '31105', '31108', '31197', '31211', '31228', '31234', '31250', '31251', '31254', '31259', '31289', '31324']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue46072'
versions = []

markshannon · 2021-12-14T16:17:54Z

By "stats" I mean the internal numbers gathered by the VM for performance and debugging. This has nothing to do with any statistics module.

Currently various parts of the VM gather stats: the GC, dicts, the bytecode interpreter, type lookup cache, etc.

These stats have various compile time flags to turn them on and off. They have differing ways to display the stats, and no unified way to gather stats across different runs.

For the specialization stats we dump stats, which we can parse to collate stats across runs.

We should:

Add a --with-pystats config flag (like with-pydebug) to turn on stat gathering at build time.
Convert the other stats to the format used by the specialization stats, so all stats can be parsed and collated.

markshannon · 2021-12-15T15:32:41Z

New changeset 342b93f by Mark Shannon in branch 'main':
bpo-46072: Add --with-pystats configure option to simplify gathering of VM stats (GH-30116)
342b93f

tiran · 2021-12-15T17:14:09Z

Could you please add the new option to Doc/using/configure.rst ?

markshannon · 2021-12-16T13:41:07Z

New changeset 4506bbe by Mark Shannon in branch 'main':
bpo-46072: Document --enable-stats option. (GH-30139)
4506bbe

tiran · 2021-12-16T14:25:02Z

I just noticed that you are using hard-coded paths with /tmp for the pystats directory. That's problematic and opens the possibility of a symlink race attack.

Could please add exclusive create to _Py_PrintSpecializationStats()? The will prevent symlink attacks. fopen() mode "x" is not generally available in all libcs. You have to combine open() and fdopen():

int flags = O_WRONLY | O_CREAT | O_EXCL;
#ifdef O_NOFOLLOW
flags |= O_NOFOLLOW;
#endif
#ifdef O_CLOEXEC
flags |= O_CLOEXEC;
#endif

int fd = open(path, flags);
if (fd >= 0) {
    FILE *fout = fdopen(fd, "w");
}

markshannon · 2021-12-16T14:47:14Z

The --enable-stats option is for CPython development and shouldn't be turned on for a release version, so I'm not really concerned about people attacking their own machines.

markshannon · 2021-12-17T14:48:11Z

New changeset efd6236 by Mark Shannon in branch 'main':
bpo-46072: Add top level stats struct (GH-30169)
efd6236

markshannon · 2022-01-28T15:20:56Z

New changeset 90ab138 by Mark Shannon in branch 'main':
bpo-46072: Add simple stats for Python calls. (GH-30989)
90ab138

markshannon · 2022-02-01T15:05:39Z

New changeset 48be46e by Mark Shannon in branch 'main':
bpo-46072: Add some object layout and allocation stats (GH-31051)
48be46e

markshannon · 2022-02-02T11:02:01Z

New changeset 187930f by Mark Shannon in branch 'main':
bpo-46072: Add some frame stats. (GH-31060)
187930f

markshannon · 2022-02-07T14:31:01Z

New changeset 062460e by Mark Shannon in branch 'main':
bpo-46072: Improve LOAD_METHOD stats (GH-31104)
062460e

markshannon · 2022-02-07T16:51:53Z

New changeset 9c979d7 by Mark Shannon in branch 'main':
bpo-46072: Merge dxpairs into py_stats. (GH-31197)
9c979d7

markshannon · 2022-02-09T12:30:39Z

New changeset f71a69a by Mark Shannon in branch 'main':
bpo-46072: Output stats as markdown with collapsible sections. (GH-31228)
f71a69a

brandtbucher · 2022-02-16T16:50:03Z

New changeset 580cd9a by Brandt Bucher in branch 'main':
bpo-46072: Add detailed failure stats for BINARY_OP (GH-31289)
580cd9a

penguin-wwy · 2022-04-29T03:24:35Z

Python API should be provided? This allows us to have more accurate statistical information about the status.

For example, in pyperformance bm_hexiom:

import _opcode as stats
def main(loops, level):
    ...
    ...

    stats.init_specialization_stats() # Clear unrelated status information
    stats.enable_specialization_stats()
    for _ in range_it: # core loop of benchmark
        stream = io.StringIO()
        solve_file(board, strategy, order, stream)
        output = stream.getvalue()
        stream = None
    stats.disable_specialization_stats()
    print(stats.get_specialization_stats())# Print status information
    ...
    ...
    return dt

Then get stats dict:

{'load_attr': {'success': 55, 'failure': 4, 'hit': 72355, 'deferred': 168, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_global': {'success': 81, 'failure': 0, 'hit': 47503, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_method': {'success': 36, 'failure': 0, 'hit': 36698, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_subscr': {'success': 47, 'failure': 48, 'hit': 78278, 'deferred': 2488, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_subscr': {'success': 9, 'failure': 0, 'hit': 3610, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_attr': {'success': 9, 'failure': 0, 'hit': 671, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'call': {'success': 54, 'failure': 9, 'hit': 36321, 'deferred': 1079, 'miss': 783, 'deopt': 14, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_op': {'success': 43, 'failure': 4, 'hit': 11646, 'deferred': 16, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'compare_op': {'success': 37, 'failure': 480, 'hit': 24938, 'deferred': 30707, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 480, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'unpack_sequence': {'success': 2, 'failure': 0, 'hit': 122, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'precall': {'success': 82, 'failure': 9, 'hit': 79148, 'deferred': 234, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0)}}

This helps us to more precisely analyze the impact of optimization on benchmark.

ezio-melotti transferred this issue from another repository Apr 10, 2022

penguin-wwy mentioned this issue May 5, 2022

gh-90230: Python API for specialization stats #92323

Open

python / cpython Public

Unify handling of stats in the CPython VM #90230

Unify handling of stats in the CPython VM #90230

markshannon commented Dec 14, 2021

markshannon commented Dec 14, 2021

markshannon commented Dec 15, 2021

tiran commented Dec 15, 2021

markshannon commented Dec 16, 2021

tiran commented Dec 16, 2021

markshannon commented Dec 16, 2021

markshannon commented Dec 17, 2021

markshannon commented Jan 28, 2022

markshannon commented Feb 1, 2022

markshannon commented Feb 2, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 9, 2022

brandtbucher commented Feb 16, 2022

penguin-wwy commented Apr 29, 2022 •

edited

python / cpython Public

Unify handling of stats in the CPython VM #90230

Unify handling of stats in the CPython VM #90230

Comments

markshannon commented Dec 14, 2021

markshannon commented Dec 14, 2021

markshannon commented Dec 15, 2021

tiran commented Dec 15, 2021

markshannon commented Dec 16, 2021

tiran commented Dec 16, 2021

markshannon commented Dec 16, 2021

markshannon commented Dec 17, 2021

markshannon commented Jan 28, 2022

markshannon commented Feb 1, 2022

markshannon commented Feb 2, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 9, 2022

brandtbucher commented Feb 16, 2022

penguin-wwy commented Apr 29, 2022 • edited

penguin-wwy commented Apr 29, 2022 •

edited