tests: refactor breakdown metrics tests to not override Agent internals#2316
tests: refactor breakdown metrics tests to not override Agent internals#2316
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪 |
… diff enough in node v10 that it loses that race reliably
Saw this in CI:
# does not include transaction breakdown when disabled
/home/runner/work/apm-agent-nodejs/apm-agent-nodejs/lib/metrics/reporter.js:56
this._agent._transport.sendMetricSet(metric)
^
TypeError: Cannot read property 'sendMetricSet' of null
at afterAll (/home/runner/work/apm-agent-nodejs/apm-agent-nodejs/lib/metrics/reporter.js:56:31)
at /home/runner/work/apm-agent-nodejs/apm-agent-nodejs/node_modules/after-all-results/index.js:20:25
at _combinedTickCallback (internal/process/next_tick.js:131:7)
|
|
||
| for (const metric of seen.values()) { | ||
| this._agent._transport.sendMetricSet(metric) | ||
| if (this._agent._transport) { |
There was a problem hiding this comment.
Reviewer note: This was added to not crash in the metrics reporting interval if, in the interim, agent.destroy() has been called -- which sets this._agent._transport to null. Similar is already done for sendSpan here:
apm-agent-nodejs/lib/instrumentation/index.js
Line 268 in ec719c2
astorm
left a comment
There was a problem hiding this comment.
On objection to this -- it does what it says on the tin and it's hard to argue with better isolated tests. Approving.
One thing I did notice -- in this new branch the breakdown.test.js tests take significantly longer to run. On my local dev machine the test/metrics/breakdown.test.js tests took 14.345 total seconds where the old suites took 0.369 seconds. This was using npx to invoke the tests via tape
With the new branch
$ time npx tape test/metrics/breakdown.test.js
// ...
ok 101 does not have un-ended spans
1..101
# tests 101
# pass 101
# ok
$ npx tape test/metrics/breakdown.test.js 0.42s user 0.08s system 3% cpu 14.345 total
Old/current master
$ time npx tape test/metrics/breakdown.test.js
// ...
ok 101 does not have un-ended spans
1..101
# tests 101
# pass 101
# ok
$ npx tape test/metrics/breakdown.test.js 0.33s user 0.06s system 104% cpu 0.369 total
Not a deal breaker (it's stills seconds, not minutes), just an observation.
Yes, that is unfortunate. The earlier test model of "do test assertions and end as soon as I get the expected number of items on the transport" is faster, because the items we are waiting for (the encoded span and the breakdown metrics) are sent (via The test model I switched to here is:
This model is, IMHO, less error prone, as mentioned in the PR description. However, that step 2 is in this case way slower because the only "API available on the public API" to know that we've gotten the breakdown metrics for all ended transactions and spans is "wait for the next metricsInterval". I think eventually we'll want our agent.flush() or similar API to improve on this (#2294). So yes, this adds ~14s to our test run of ~15 minutes (in CI: https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/job/master/), which I think should be okay. Eventually if/when we can move to |
…ls (elastic#2316) This refactors breakdown.test.js to not need a resetAgent(...) that knows so much about Agent internals. Now that we have a usable agent.destroy(), use that and create a new Agent for each test case. Unfortunately the change results in this test file running much slower. See PR discussion for justification. Also, protect against against a possible crash: TypeError: Cannot read property 'sendMetricSet' of null in metrics/reporter.js if `agent.destroy()` has been called.
This refactors breakdown.test.js to not need a
resetAgent(...)that knows so much about Agent internals. Now that we have a usableagent.destroy(), use that and create a new Agent for each test case.This also moves away from the test/_mock_http_client.js testing style of waiting for an expected number of sent events to decide when to end a test case. This style is brittle when there is a test failure: either hanging waiting for an event that isn't coming, or crashing with
throw new Error('too many writes').