Skip to content

test(NODE-5028): assert MongoClients are garbage collectable#3553

Merged
dariakp merged 26 commits intomainfrom
NODE-5028-add-leak-repro
Feb 10, 2023
Merged

test(NODE-5028): assert MongoClients are garbage collectable#3553
dariakp merged 26 commits intomainfrom
NODE-5028-add-leak-repro

Conversation

@nbbeeken
Copy link
Copy Markdown
Contributor

@nbbeeken nbbeeken commented Feb 2, 2023

Description

Using both self process memory reporting and heapsnapshots, we can check that after 100 clients have been created and closed they are removed from memory by the GC.

What is changing?

Is there new documentation needed for these changes?

What is the motivation for this change?

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-5028-add-leak-repro branch from 3a2ee67 to 7dad424 Compare February 2, 2023 19:32
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
@nbbeeken nbbeeken requested a review from addaleax February 2, 2023 20:25
Copy link
Copy Markdown
Contributor

@baileympearson baileympearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use AsyncResources for this.

If we attach an async resource to each mongo client and enable async hooks, then we can track when the async resources (and by proxy, the MongoClient) are actually garbage collected without relying on measuring the heap size.

@nbbeeken nbbeeken marked this pull request as ready for review February 2, 2023 22:44
Copy link
Copy Markdown
Contributor

@baileympearson baileympearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is pretty red, a number of failures are unrelated (the fle failures on node18 appear on other PRs, I wonder if Node18.14 broke that test)

Should this test be a part of the revert PRs maybe? Otherwise, we can't demonstrate with a test that the revert fixes the memory lead issue, and we can't merge this first because it will fail without the revert.

Also, the async resource tracking always records exactly one client (even with the revert). If we keep the async hooks logic, we'll need to update the assertion (it's currently asserting against 0). But should we keep that logic if we don't know why exactly one mongo client is always reported as being left?

@nbbeeken
Copy link
Copy Markdown
Contributor Author

nbbeeken commented Feb 3, 2023

We should merge the reverts in isolation for organizational sake, and then we can merge this when it's able to be green. I can post an evg patch showing this running against the driver with the revert

@nbbeeken
Copy link
Copy Markdown
Contributor Author

nbbeeken commented Feb 3, 2023

@nbbeeken nbbeeken force-pushed the NODE-5028-add-leak-repro branch from ba7df9b to 81b3222 Compare February 6, 2023 17:54
@durran durran added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Feb 6, 2023
Copy link
Copy Markdown
Contributor

@durran durran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @baileympearson 's comment as well - I feel like this is overly complex. Can we remove all the heap size checking and just create multiple MongoClients, run some operations on them and close them and then assert they can be released?

@nbbeeken
Copy link
Copy Markdown
Contributor Author

nbbeeken commented Feb 7, 2023

We can't be sure that the tracking is accurate because as the test shows it is still showing one MongoClient in memory, while the heap snapshot shows none. So which is the true source of truth?

@durran
Copy link
Copy Markdown
Contributor

durran commented Feb 7, 2023

We can't be sure that the tracking is accurate because as the test shows it is still showing one MongoClient in memory, while the heap snapshot shows none. So which is the true source of truth?

Yeah that's really odd and has the feeling it could be a flaky test in the future as well as checking the difference between start and end memory... Maybe we only need the test on line 137 and that's enough?

@nbbeeken
Copy link
Copy Markdown
Contributor Author

nbbeeken commented Feb 7, 2023

I do permit a wiggle room of 3 megabytes which is pretty friendly but that was just a guesstimate, we could make the range much larger to avoid flakes. The leak was growing unbounded easily reaching 90+ MB with 100 iterations of client.connect/close so we could do a range of 20MB? it just depends on how much allocation would be beyond our expected amount (relative to startup amount)

@nbbeeken nbbeeken requested a review from durran February 8, 2023 15:26
durran
durran previously approved these changes Feb 8, 2023
@durran durran added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Feb 8, 2023
@nbbeeken nbbeeken requested a review from durran February 8, 2023 18:33
durran
durran previously approved these changes Feb 8, 2023
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
const mongoClient = new MongoClient(uri);
await mongoClient.connect();
const db = mongoClient.db();
await Promise.all([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why db.collections() and why this many times? also, are we interested in ensuring no leak when there is only one client or should we test multiples? (your helper script runs multiple iterations, but I am trying to understand what the actual scenario under test is)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation is arbitrary, the parallelism is to get "some" number of connections in the connection pool, this bug does reproduce without this though, I can make it one db.collections call

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but why db.collections? why isn't the connect call enough? if you just want connections in the connection pool, then setting a minpoolsize feels like a more straightforward approach

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I minimized the reproduction more, it's a connect and ping, although I guess the connect is redundant with any operation. I tried only setting a minPoolSize, but the leak stopped reproducing so it would seem that running an operation over the connections is required.

@baileympearson FYI ^

Comment thread test/integration/node-specific/resource_tracking_script_builder.ts Outdated
Comment thread test/integration/node-specific/resource_tracking_script_builder.ts Outdated
Comment thread test/integration/node-specific/resource_tracking_script_builder.ts

// If any of the above throws we won't reach these unlinks that clean up the created files.
// This is intentional so that when debugging the file will still be present to check it for errors
// await unlink(scriptName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidentally commented out?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you 😅

Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
Comment thread test/integration/node-specific/resource_clean_up.test.ts Outdated
nbbeeken and others added 2 commits February 10, 2023 11:24
Co-authored-by: Daria Pardue <daria.pardue@mongodb.com>
@nbbeeken nbbeeken requested a review from dariakp February 10, 2023 17:02
@dariakp dariakp merged commit 4bac63c into main Feb 10, 2023
@dariakp dariakp deleted the NODE-5028-add-leak-repro branch February 10, 2023 20:18
@nbbeeken nbbeeken restored the NODE-5028-add-leak-repro branch February 10, 2023 20:26
@nbbeeken nbbeeken deleted the NODE-5028-add-leak-repro branch February 10, 2023 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team Review Needs review from team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants