Skip to content

Conversation

@TrafalgarZZZ
Copy link
Member

Ⅰ. Describe what this PR does

#3756 is caused by concurrent writes on Fluid's metrics maps. Specifically, when there are multiple goroutines reconciling multiple Runtimes, it is possible that a deletion and a inserting happen at the same time on the same metrics map, which makes fatal error: concurrent map writes.

This PR fixes by adding Mutex to protect race condition on the maps.

Ⅱ. Does this pull request fix one issue?

fixes #3756

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
@codecov
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.19%. Comparing base (4d44dcf) to head (6ac78b1).
Report is 2 commits behind head on master.

❗ Current head 6ac78b1 differs from pull request most recent head 7be99d2. Consider uploading reports for the commit 7be99d2 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3757   +/-   ##
=======================================
  Coverage   64.19%   64.19%           
=======================================
  Files         474      474           
  Lines       28236    28236           
=======================================
  Hits        18127    18127           
  Misses       7945     7945           
  Partials     2164     2164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

func init() {
metrics.Registry.MustRegister(datasetUFSFileNum, datasetUFSTotalSize)
datasetMetricsMap = map[string]*datasetMetrics{}
datasetMetricsMutex = &sync.Mutex{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using &sync.RWMutex{} to separate read and write?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two related operations are insertion and deletion. There is no read operation on metrics map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. RWMutex lgtm, will change it.

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
@sonarqubecloud
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
12.7% Duplication on New Code

See analysis details on SonarCloud

Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@fluid-e2e-bot
Copy link

fluid-e2e-bot bot commented Mar 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fluid-e2e-bot fluid-e2e-bot bot merged commit 67b8c8c into fluid-cloudnative:master Mar 12, 2024
dashanji pushed a commit to dashanji/fluid that referenced this pull request Apr 7, 2024
…-cloudnative#3757)

* Fix fatal error: concurrent map writes

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>

* Use RWMutex to separate read and write

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>

* Use sync.Map to protect metrics race condition

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>

---------

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Runtime controller exits with fatal error: concurrent map writes

2 participants