Skip to content

Add support for Backtraces and profiling#1550

Merged
JPercival merged 30 commits intocqframework:masterfrom
scymtym:backtrace-and-profile
Jun 28, 2025
Merged

Add support for Backtraces and profiling#1550
JPercival merged 30 commits intocqframework:masterfrom
scymtym:backtrace-and-profile

Conversation

@scymtym
Copy link
Copy Markdown
Contributor

@scymtym scymtym commented May 14, 2025

This is a draft for extending the "stack of windows" concept of the CqlEngine to a stack of ActivationFrames which track more information than just arguments and local variables, namely

  • Expression/function definitions being evaluated in particular activation frames

  • Start and end times for the evaluation for activation frames

  • Cache hit information for certain activation frames

On top of that extension, this pull request adds two features to the CQLEngine

  1. Backtraces in CQL exceptions that represent runtime error. The backtrace information can be used by clients to display backtraces with expression/function names, arguments, local variables and source information. Example:
    cql-repl-errors

  2. Optional profiling. A new engine option can be used to turn enable profiling which the engine to aggregate information from ActivationFrames into a deterministic performance profile that can be used, for example, to render a flamegraph. Example:
    cql-repl-flamegraph

scymtym added 2 commits May 15, 2025 11:27
Each call to CqlEngine.evaluate causes a call to State.init which,
before this change, pushed one window. The actual evaluation
associated with the CqlEngine.evaluate call pushed and popped a
balanced number of windows but the one pushed by State.init remained
on the stack in the end. Successive CqlEngine.evaluate calls would
therefore accumulate an unbounded number of windows on the stack.

As a slight interface change, clients that bypass
CqlEngine.evaluate[Expressions] by using
CqlEngine.getEvaluationVisitor now have to call State.beginEvaluation
and State.endEvaluation themselves. The
ExpressionCacheTest.expression_cache test had to be change in this way.
This makes the association between local variables introduced by, say,
query expressions and the containing expression definition clear. For
function definitions, there is no change since function invocation
already pushed a fresh window.
@scymtym scymtym force-pushed the backtrace-and-profile branch from baa0b81 to 23b902f Compare May 15, 2025 11:14
scymtym added 7 commits May 19, 2025 16:48
The stack of ActivationFrames serves the same purpose as the window
stack but records additional information compared to just groups of
local variables:

* Each ActivationFrame (except the dummy root frame) has an associated
  element which is an ExpressionDef (possibly a FunctionDef) or a
  Retrieve.

* Each ActivationFrame has a start time and an end time so that the
  time spent evaluating the frame can be tracked.

* Each ActivationFrame has an isCached field so that the cache hits
  and misses when evaluating ExpressionDef elements (but not
  FunctionDef elements) can be tracked.

This additional information will be useful for at least two purposes:

1. The associated elements can be used to construct a backtrace with
   source information when an exception is thrown during evaluation.

2. The timing and cache hit information can be used to build a
   deterministic profile of a given evaluation which is useful for
   performance analysis.

Subsequent commits will add code for these two use-cases.
A backtrace represents a stack of call-like frames from the root of an
evaluation to a particular sub-expression, commonly a sub-expression
in which an exception was thrown. Client code can extract this
Backtrace object from a caught exception and use it to display a
backtrace similar to those known from other programming languages.
Before this change, there were two performance issues in the profile
construction code

1. Profile.register traversed the whole stack to find the appropriate
   profile node which is O(N) where N is the depth of the "call stack".

2. Children of profile nodes were stored in a HashMap the keys of
   which relied on object-equality involving Element
   objects. Object-equality for those is very expensive.

This change solves issue 1 by remembering the stack of profile nodes
that correspond to the stack of current activation frames and
incrementally updating the top of the stack which is O(1).

This change solves issue 2 by (basically; the context adds a slight
complication) storing the children of profile nodes in an
IdentityHashMap so that Element.equals is not used.
@scymtym scymtym force-pushed the backtrace-and-profile branch from 23b902f to 5e89f72 Compare May 19, 2025 14:50
@c-schuler c-schuler self-requested a review May 30, 2025 17:17
@c-schuler
Copy link
Copy Markdown
Contributor

c-schuler commented May 30, 2025

This is very cool, thank you @scymtym !

A couple of enhancements that I would suggest:

  1. Introduce tiered profiling levels (e.g., summary vs. detailed). This being a binary (on/off) is fine for now, but introducing a toggle for verbosity or controlling granularity will likely be a desired enhancement in the future.
  2. Only push activation frames if profiling or backtracing is enabled. This avoids overhead in case both are disabled.

As this PR goes through the paces I would also like to ensure the following take place:

  • Update the project’s documentation to include usage guidelines for the new backtrace and profiling features.
  • Provide configuration settings to enable or disable backtracing and profiling, allowing users to control the overhead associated with these features.
  • Assess the performance impact of the added features, especially in production environments, and optimize as necessary.
  • Tests should cover scenarios for both backtracing and profiling, including edge cases like nested evaluations and cached results.
  • Simulated errors and exceptions should be tested to verify that backtraces and debug logs provide meaningful insights.

With thorough testing and proper documentation, these features will be valuable additions to the project. Thanks again!

@scymtym
Copy link
Copy Markdown
Contributor Author

scymtym commented Jun 2, 2025

@c-schuler Thank you for your feedback and suggestions. I can address some of the suggested changes by myself but will need more guidance for others.

Introduce tiered profiling levels (e.g., summary vs. detailed). This being a binary (on/off) is fine for now, but introducing a toggle for verbosity or controlling granularity will likely be a desired enhancement in the future.

I can see this as a useful distinction when presenting the results of profiling - the flamegraph-SVG renderer included in the pull request is mainly a demo of one particular result presentation style that works without dependencies and is not too complex (and happened to be sufficient for my use-case). One question could be whether this renderer should be omitted from the pull request so that clients have complete control and responsibility for the presentation of profiling results. None of this has any bearing on runtime performance, of course.

Regarding runtime behavior and the level of activation frames, I'm not sure what a summary level or other finer grained options could look like. The only information in activation frames that is specific to profiling are the start and end times of the respective evaluation and the cache hit/miss bit. The impact on execution time is (presumably) very minor compared to the profiling work of building the call tree and cumulative timings which are controlled by the singular engine option I added.

A related topic is coverage tracking. I remember seeing something about coverage tracking in the codebase but I can't remember where now. Since a flat profile (just call counts, no nesting) also provides coverage tracking, some integration work may be required in that area to avoid multiple mechanisms for the same functionality.

Only push activation frames if profiling or backtracing is enabled. This avoids overhead in case both are disabled.

Since the stack of activation frames replaces the previous stack of windows, activation frames themselves are not optional.

For clients that can tolerate worse backtraces, there could be an option to only push activation frames when absolutely necessary and only with the absolutely necessary information. I doubt the impact will be large, possibly to the point that querying the engine options frequently would negate the savings, but that would have to be measured.

Also, most of the work for backtraces is done only after an error has already occurred by executing code in catch blocks as the stack unwinds, not in advance. Granted, establishing the extra try/catch blocks might in itself have a slight performance impact but again repeatedly querying engine options could be the more expensive operation.

  • Update the project’s documentation to include usage guidelines for the new backtrace and profiling features.

I think I can do that. The documentation would mainly explain how get the backtrace and profiling information, not how to present them. Something like the backtrace in the above screenshots needs some work on the client's part. The flamegraph in the screenshot can be produced with just the code in this pull request - provided we don't decide against having the rendering code in the Profile class.

Provide configuration settings to enable or disable backtracing and profiling, allowing users to control the overhead associated with these features.

Such an option exists for profiling. Activation frames cannot be turned off completely since their main purpose, holding variable bindings, is not optional. Reducing the number of activation frames could be an option, as I mentioned above, if it turns out to have a noticeable performance impact.

Assess the performance impact of the added features, especially in production environments, and optimize as necessary.

I would need help with this one since I don't have access to any relevant production environments.

Tests should cover scenarios for both backtracing and profiling, including edge cases like nested evaluations and cached results.

Since the existing tests exercise those edge cases and activation frames are used throughout, the main missing piece would be running the existing tests with profiling enabled. Any suggestions regarding how I could do that?

Simulated errors and exceptions should be tested to verify that backtraces and debug logs provide meaningful insights.

I can add test cases that validate the information in backtraces.

@scymtym
Copy link
Copy Markdown
Contributor Author

scymtym commented Jun 10, 2025

@c-schuler With the most recent commits, the points you brought up should be mostly addressed:

  • Options for enabled and disabling the features: I explained above that a single engine option should be enough.
  • Documentation: I completed the javadoc documentation of the added classes. I'm not sure whether there are other places in which the features should be documented.
  • Performance Impact: I ran CqlPerformanceTest for three configurations: without my changes, with my changes + profiling disabled and with my changes + profiling enabled. There were no significant differences in execution time. I also fixed some of the messages about unexpected results in CqlPerformanceTest (see #cql chat) to see whether that impacted execution time of the test, but there didn't seem to be a significant impact either.
  • Unit Tests: I added unit tests for the backtrace and profiling features

I haven't pushed the change that allows CqlPerformanceTest to run with profiling enabled since it seems like I should wait for #1556 (See discussion in #cql).

@scymtym scymtym marked this pull request as ready for review June 24, 2025 14:58
@antvaset antvaset self-assigned this Jun 25, 2025
Reported-by: Anton Vasetenkov <antvaset@gmail.com>
scymtym added 4 commits June 26, 2025 11:48
Suggested-by: Anton Vasetenkov <antvaset@gmail.com>
Suggested-by: Anton Vasetenkov <antvaset@gmail.com>
Suggested-by: Anton Vasetenkov <antvaset@gmail.com>
The language level was changed in commit
b55dc7b.

Suggested-by: Anton Vasetenkov <antvaset@gmail.com>
@antvaset
Copy link
Copy Markdown
Contributor

Hi @scymtym , this is so cool and very well done! And thanks for merging in my suggested changes.

Agree with @c-schuler that more toggles will be needed to control future profiling features like verbosity and granularity. In terms of what those toggles could look like, this TODO is captured by #1287 and will be addressed separately and perhaps more holistically. We don't currently have a README for CQL engine options in this repo (only code comments), and I can do that separately as well for all engine options (see new issue #1574).

@scymtym
Copy link
Copy Markdown
Contributor Author

scymtym commented Jun 26, 2025

@antvaset Thank you for the review.

I agree that some way of controlling the profiling feature, that is capturing profiling info as well as processing the finished profile, in more detail will be needed. The ad-hoc flamegraph-SVG writer I included is more a demo than anything. Whether those fleshed out details should take the form of engine options, I'm not really sure but that can be worked later as you suggest.

Copy link
Copy Markdown
Contributor

@c-schuler c-schuler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

@JPercival JPercival merged commit 3bb626a into cqframework:master Jun 28, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants