Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify handling of stats in the CPython VM #90230

Open
markshannon opened this issue Dec 14, 2021 · 15 comments
Open

Unify handling of stats in the CPython VM #90230

markshannon opened this issue Dec 14, 2021 · 15 comments

Comments

@markshannon
Copy link

@markshannon markshannon commented Dec 14, 2021

BPO 46072
Nosy @tiran, @markshannon, @brandtbucher
PRs
  • #30116
  • #30139
  • #30145
  • #30169
  • #30989
  • #31051
  • #31060
  • #31079
  • #31104
  • #31105
  • #31108
  • #31197
  • #31211
  • #31228
  • #31234
  • #31250
  • #31251
  • #31254
  • #31259
  • #31289
  • #31324
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-12-14.16:17:53.632>
    labels = []
    title = 'Unify handling of stats in the CPython VM'
    updated_at = <Date 2022-02-16.16:50:02.905>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2022-02-16.16:50:02.905>
    actor = 'brandtbucher'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2021-12-14.16:17:53.632>
    creator = 'Mark.Shannon'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46072
    keywords = ['patch']
    message_count = 14.0
    messages = ['408543', '408613', '408618', '408702', '408704', '408708', '408787', '412002', '412271', '412350', '412750', '412769', '412901', '413341']
    nosy_count = 3.0
    nosy_names = ['christian.heimes', 'Mark.Shannon', 'brandtbucher']
    pr_nums = ['30116', '30139', '30145', '30169', '30989', '31051', '31060', '31079', '31104', '31105', '31108', '31197', '31211', '31228', '31234', '31250', '31251', '31254', '31259', '31289', '31324']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue46072'
    versions = []

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Dec 14, 2021

    By "stats" I mean the internal numbers gathered by the VM for performance and debugging. This has nothing to do with any statistics module.

    Currently various parts of the VM gather stats: the GC, dicts, the bytecode interpreter, type lookup cache, etc.

    These stats have various compile time flags to turn them on and off. They have differing ways to display the stats, and no unified way to gather stats across different runs.

    For the specialization stats we dump stats, which we can parse to collate stats across runs.

    We should:

    1. Add a --with-pystats config flag (like with-pydebug) to turn on stat gathering at build time.
    2. Convert the other stats to the format used by the specialization stats, so all stats can be parsed and collated.

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Dec 15, 2021

    New changeset 342b93f by Mark Shannon in branch 'main':
    bpo-46072: Add --with-pystats configure option to simplify gathering of VM stats (GH-30116)
    342b93f

    @tiran
    Copy link

    @tiran tiran commented Dec 15, 2021

    Could you please add the new option to Doc/using/configure.rst ?

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Dec 16, 2021

    New changeset 4506bbe by Mark Shannon in branch 'main':
    bpo-46072: Document --enable-stats option. (GH-30139)
    4506bbe

    @tiran
    Copy link

    @tiran tiran commented Dec 16, 2021

    I just noticed that you are using hard-coded paths with /tmp for the pystats directory. That's problematic and opens the possibility of a symlink race attack.

    Could please add exclusive create to _Py_PrintSpecializationStats()? The will prevent symlink attacks. fopen() mode "x" is not generally available in all libcs. You have to combine open() and fdopen():

    int flags = O_WRONLY | O_CREAT | O_EXCL;
    #ifdef O_NOFOLLOW
    flags |= O_NOFOLLOW;
    #endif
    #ifdef O_CLOEXEC
    flags |= O_CLOEXEC;
    #endif
    
    int fd = open(path, flags);
    if (fd >= 0) {
        FILE *fout = fdopen(fd, "w");
    }

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Dec 16, 2021

    The --enable-stats option is for CPython development and shouldn't be turned on for a release version, so I'm not really concerned about people attacking their own machines.

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Dec 17, 2021

    New changeset efd6236 by Mark Shannon in branch 'main':
    bpo-46072: Add top level stats struct (GH-30169)
    efd6236

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Jan 28, 2022

    New changeset 90ab138 by Mark Shannon in branch 'main':
    bpo-46072: Add simple stats for Python calls. (GH-30989)
    90ab138

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Feb 1, 2022

    New changeset 48be46e by Mark Shannon in branch 'main':
    bpo-46072: Add some object layout and allocation stats (GH-31051)
    48be46e

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Feb 2, 2022

    New changeset 187930f by Mark Shannon in branch 'main':
    bpo-46072: Add some frame stats. (GH-31060)
    187930f

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Feb 7, 2022

    New changeset 062460e by Mark Shannon in branch 'main':
    bpo-46072: Improve LOAD_METHOD stats (GH-31104)
    062460e

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Feb 7, 2022

    New changeset 9c979d7 by Mark Shannon in branch 'main':
    bpo-46072: Merge dxpairs into py_stats. (GH-31197)
    9c979d7

    @markshannon
    Copy link
    Author

    @markshannon markshannon commented Feb 9, 2022

    New changeset f71a69a by Mark Shannon in branch 'main':
    bpo-46072: Output stats as markdown with collapsible sections. (GH-31228)
    f71a69a

    @brandtbucher
    Copy link

    @brandtbucher brandtbucher commented Feb 16, 2022

    New changeset 580cd9a by Brandt Bucher in branch 'main':
    bpo-46072: Add detailed failure stats for BINARY_OP (GH-31289)
    580cd9a

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @penguin-wwy
    Copy link

    @penguin-wwy penguin-wwy commented Apr 29, 2022

    Python API should be provided? This allows us to have more accurate statistical information about the status.

    For example, in pyperformance bm_hexiom:

    import _opcode as stats
    def main(loops, level):
        ...
        ...
    
        stats.init_specialization_stats() # Clear unrelated status information
        stats.enable_specialization_stats()
        for _ in range_it: # core loop of benchmark
            stream = io.StringIO()
            solve_file(board, strategy, order, stream)
            output = stream.getvalue()
            stream = None
        stats.disable_specialization_stats()
        print(stats.get_specialization_stats())# Print status information
        ...
        ...
        return dt

    Then get stats dict:

    {'load_attr': {'success': 55, 'failure': 4, 'hit': 72355, 'deferred': 168, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_global': {'success': 81, 'failure': 0, 'hit': 47503, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_method': {'success': 36, 'failure': 0, 'hit': 36698, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_subscr': {'success': 47, 'failure': 48, 'hit': 78278, 'deferred': 2488, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_subscr': {'success': 9, 'failure': 0, 'hit': 3610, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_attr': {'success': 9, 'failure': 0, 'hit': 671, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'call': {'success': 54, 'failure': 9, 'hit': 36321, 'deferred': 1079, 'miss': 783, 'deopt': 14, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_op': {'success': 43, 'failure': 4, 'hit': 11646, 'deferred': 16, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'compare_op': {'success': 37, 'failure': 480, 'hit': 24938, 'deferred': 30707, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 480, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'unpack_sequence': {'success': 2, 'failure': 0, 'hit': 122, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'precall': {'success': 82, 'failure': 9, 'hit': 79148, 'deferred': 234, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0)}}
    

    This helps us to more precisely analyze the impact of optimization on benchmark.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants