Skip to content

Conversation

@dpgeorge
Copy link
Member

Based on #4265, with changes to build on all ports.

@dpgeorge dpgeorge force-pushed the py-profiling-support branch 3 times, most recently from 8c05738 to 960c8e4 Compare August 20, 2019 04:06
@dpgeorge
Copy link
Member Author

@malinah Here I've taken your PR and added some commits (fixes) to get it building on most of the ports.

There a problem running the test tests/extmod/vfs_fat_fileio1.py under the unix coverage build with settrace enabled: this test has finalisers in it and the finaliser calls some Python code. This then locks up in mp_obj_new_frame() because it tries to allocate memory for a frame when the GC is currently in the middle of a sweep cycle (so no memory allocation is possible at all). Probably the only way to handle this is to just skip profiling altogether if the GC is locked, rather than try to unlock it (which might not always work, as found out here).

Also, I'd like to rename the py/profiling.[ch] files to py/profile.[ch] to match the way other files are named (eg parse.h, compile.h).

@malinah
Copy link
Contributor

malinah commented Aug 21, 2019

@malinah Here I've taken your PR and added some commits (fixes) to get it building on most of the ports.

Wow. Awesome!

There a problem running the test tests/extmod/vfs_fat_fileio1.py under the unix coverage build with settrace enabled: this test has finalisers in it and the finaliser calls some Python code. This then locks up in mp_obj_new_frame() because it tries to allocate memory for a frame when the GC is currently in the middle of a sweep cycle (so no memory allocation is possible at all). Probably the only way to handle this is to just skip profiling altogether if the GC is locked, rather than try to unlock it (which might not always work, as found out here).

Ah. How is it possible for the mp_obj_new_frame() being executed while GC is doing sweep?
My understanding is that gc_sweep is atomic operation.

Anyway. The frame and subsequent code objects created in the mp_obj_new_frame() are complementary parts of code_state so maybe they should be allocated in the same way although it's quite complicated as seen in py/objfun.c:fun_bc_call:

    // allocate state for locals and stack
    mp_code_state_t *code_state = NULL;
    #if MICROPY_ENABLE_PYSTACK
    code_state = mp_pystack_alloc(sizeof(mp_code_state_t) + state_size);
    #else
    if (state_size > VM_MAX_STATE_ON_STACK) {
        code_state = m_new_obj_var_maybe(mp_code_state_t, byte, state_size);
        #if MICROPY_DEBUG_VM_STACK_OVERFLOW
        if (code_state != NULL) {
            memset(code_state->state, 0, state_size);
        }
        #endif
    }
    if (code_state == NULL) {
        code_state = alloca(sizeof(mp_code_state_t) + state_size);
        #if MICROPY_DEBUG_VM_STACK_OVERFLOW
        memset(code_state->state, 0, state_size);
        #endif
        state_size = 0; // indicate that we allocated using alloca
    }
    #endif

Later when the trace callback is invoked the actual micropython objects are created. This should be the correct solution. Although a bit slower and memory inefficient.

This hack in profiling.c should be removed then:

    bool gc_is_locked_tmp = gc_is_locked();
    if (gc_is_locked_tmp) {
        gc_unlock();
    }

BTW What is the use-case for the gc_lock()? I have seen it only in a few places (hardware related code) but I didn't figure out why is it needed there.

Also, I'd like to rename the py/profiling.[ch] files to py/profile.[ch] to match the way other files are named (eg parse.h, compile.h).

Sure, no problem.
FIY even more correct name would be trace.[ch] but the word trace is already over-used in different contexts so naming it profiling to differentiate the feature was close enough semantically and far enough syntactically. And I chose profiling instead of profile because it looks less like something to do with actual profiles / config profiles / user profiles ... . 😕

@dpgeorge
Copy link
Member Author

My understanding is that gc_sweep is atomic operation.

No it's not. Sweeping can run finalisers (during the sweep) which can run arbitrary Python code. Such finalisers are restricted in that they can't allocate memory.

Anyway. The frame and subsequent code objects created in the mp_obj_new_frame() are complementary parts of code_state so maybe they should be allocated in the same way

This does sound like a reasonable solution, but as you say it's going to use a lot of RAM.

Another option is to simply not create a frame when the GC is locked and don't do any tracing/profiling for such calls. I think it's acceptable for now that heap locking restricts what can be profiled. I've push a commit to do this.

What is the use-case for the gc_lock()?

It's needed by hard interrupt handlers: the IRQ may interrupt the GC during a collection phase, and the IRQ handler must still execute at that time (it cannot wait until the collection is finished), so the only thing to do is lock out any heap allocation for the duration of the IRQ handler.

FIY even more correct name would be trace.[ch] but the word trace is already over-used in different contexts

"trace" is currently only used for "traceback" in the py/ directory, but I agree that a file called py/trace.c would be expected to contain traceback code/utils.

I think the word "profile" is a good choice, and being part of a Python implementation I don't think it'd be mistaken for user profiles.

@malinah
Copy link
Contributor

malinah commented Aug 23, 2019

OK. Everything looks good now :)

@dpgeorge
Copy link
Member Author

I added initial support for tracing/profiling of frozen bytecode.

@dpgeorge dpgeorge force-pushed the py-profiling-support branch from d953491 to 054bbd8 Compare August 26, 2019 04:35
@dpgeorge dpgeorge force-pushed the py-profiling-support branch 4 times, most recently from aa74c4a to 0938050 Compare August 27, 2019 07:15
@dpgeorge
Copy link
Member Author

@malinah this is very close to being ready for merge. I added a new Travis CI target to test the settrace code and it is passing.

I also renamed the test to misc/sys_settrace_profile.py to match other test names.

One remaining question: I see you added the .py.exp file for the test, and that the output from CPython does not match this expected output (it seems filenames are handled differently?). Is there a reason you couldn't make the output match exactly MicroPython<->CPython? I might try to make it so the exp file doesn't need to be committed but rather can be generated by CPython like most of the other tests.

Copy link
Contributor

@malinah malinah Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line from uio import open is obsoleted. Please remove it.

@malinah
Copy link
Contributor

malinah commented Aug 27, 2019

@malinah this is very close to being ready for merge. I added a new Travis CI target to test the settrace code and it is passing.

Awesome!

One remaining question: I see you added the .py.exp file for the test, and that the output from CPython does not match this expected output (it seems filenames are handled differently?).

I didn't notice filenames being handled differently. When executed from the same directory tests the uPy and CPy should output the same absolute file paths which is then trimmed in the test itself.

Is there a reason you couldn't make the output match exactly MicroPython<->CPython? I might try to make it so the exp file doesn't need to be committed but rather can be generated by CPython like most of the other tests.

Yeah. That was my idea too but right now I don't think it's easily doable. The main reason being the uPy compiler produces different bytecode for conditional jumps in loops, generators, exceptions and probably something else I can't remember right now. When you run the test sys_settrace_profile.py and compare side by side you will see that uPy first jumps at the end of the loop where the conditional jump is and then makes the first iteration at the line the loop starts. This messes up the line numbering compared to CPython.

@dpgeorge dpgeorge force-pushed the py-profiling-support branch from 0938050 to d8884ab Compare August 28, 2019 01:42
@dpgeorge
Copy link
Member Author

I didn't notice filenames being handled differently. When executed from the same directory tests the uPy and CPy should output the same absolute file paths which is then trimmed in the test itself.

You're right, I made a mistake and wasn't executing them from the tests directory.

Yeah. That was my idea too but right now I don't think it's easily doable. The main reason being the uPy compiler produces different bytecode for conditional jumps in loops, generators, exceptions and probably something else I can't remember right now.

In the latest set of commits I have somewhat implemented this: I pulled out the tests that were different between CPy and uPy into sys_settrace_loop.py and sys_settrace_generator.py and included .exp files for these 2. Then renamed sys_settrace_profile.py to sys_settrace_features.py and removed its corresponding .exp file (which was large and is now no longer needed because uPy matches CPy). I also adjusted how the const-based tests work to make them pass correctly on both uPy and CPy.

So now this PR should be good to go in!

@malinah
Copy link
Contributor

malinah commented Aug 28, 2019

In the latest set of commits I have somewhat implemented this: I pulled out the tests that were different between CPy and uPy into sys_settrace_loop.py and sys_settrace_generator.py and included .exp files for these 2. Then renamed sys_settrace_profile.py to sys_settrace_features.py and removed its corresponding .exp file (which was large and is now no longer needed because uPy matches CPy). I also adjusted how the const-based tests work to make them pass correctly on both uPy and CPy.

So now this PR should be good to go in!

Well done! :)

@dpgeorge dpgeorge force-pushed the py-profiling-support branch 2 times, most recently from adb0fb0 to 53c76f2 Compare August 30, 2019 06:06
dpgeorge and others added 8 commits August 30, 2019 16:43
Prior to this patch the line number for a lambda would be "line 1" if the
body of the lambda contained only a simple expression (with no line number
stored in the parse node).  Now the line number is always reported
correctly.
This commit adds support for sys.settrace, allowing to install Python
handlers to trace execution of Python code.  The interface follows CPython
as closely as possible.  The feature is disabled by default and can be
enabled via MICROPY_PY_SYS_SETTRACE.
@dpgeorge dpgeorge force-pushed the py-profiling-support branch from 53c76f2 to 0b85b5b Compare August 30, 2019 06:50
@dpgeorge dpgeorge merged commit 0b85b5b into micropython:master Aug 30, 2019
@dpgeorge
Copy link
Member Author

This is now merged! Thank you very much @malinah for the contribution, really great work!

Note: I did a little more cleaning of the code before merging, including factoring out the mp_bytecode_get_source_line() function.

@prusnak
Copy link
Contributor

prusnak commented Aug 30, 2019

Congratulations to you both!

@malinah
Copy link
Contributor

malinah commented Aug 30, 2019

This is now merged! Thank you very much @malinah for the contribution, really great work!

Thank you @dpgeorge for your patience and great support!

@dpgeorge
Copy link
Member Author

Great collaboration all round!

@stinos
Copy link
Contributor

stinos commented Aug 31, 2019

Yes this is a really nice addition, opening the door for a debugger :)

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Jul 21, 2021
Fix crash when UART construct fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants