Skip to content

ts: fix tenant metrics scan in tsdb#99511

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
aadityasondhi:20230324.fix-tsdb-scan-over-tenants
Mar 25, 2023
Merged

ts: fix tenant metrics scan in tsdb#99511
craig[bot] merged 1 commit intocockroachdb:masterfrom
aadityasondhi:20230324.fix-tsdb-scan-over-tenants

Conversation

@aadityasondhi
Copy link
Copy Markdown
Contributor

In #98077, we introduced tenant scoped metrics into tsdb. These metric were part of the source field used for NodeIDs previously. Since the intended purpose for the system tenant was to be able to aggregate across all tenant metrics, we introduced a new scan over the entire keyspace for the given Node source. This introduced a bug:

When we prefix match the source to scan over its keyspace, it includes all sources that match that prefix. For a 10 node cluster with 3 tenants, this meant fetching timeseries for Node 1 would include:

1
1-2
1-3
10
10-2
10-3

This would then aggregate data across Node 1 and Node 10. If there were no secondary tenants, it would include the sources:

1
10

This fix separates the get of the initial system tenant metric and scan across all tenants (with a tenant prefix) to only scan the desired source. Now in the same 10 Node cluster with 3 tenants, it will include:

1
1-2
1-3

The scan is now from the beginning of 1- to its end, ensuring only Node 1's tenants are included and not Node 10 or 100 or 1000.

Fixes: #99486

Release note: None

In cockroachdb#98077, we introduced
tenant scoped metrics into tsdb. The TenantIDs were part of the source
field used for NodeIDs previously. Since the intended purpose for the
system tenant was to be able to aggregate across all tenant metrics, we
introduced a new scan over all the sources for the given Node source.

This introduced a bug:
When we prefix match the source to scan over its keyspace, it includes
all sources that match that prefix. For a 10 node cluster with 3
tenants, this meant fetching timeseries for Node 1 would include:
```
1
1-2
1-3
10
10-2
10-3
```
This would then aggregate data across Node 1 and Node 10. If there were
no secondary tenants, it would include the soruces:
```
1
10
```

This fix separates the get of the initial system tenant metric and scan
across all tenants (with a tenant prefix) to only scan the desired
source. Now in the same 10 Node cluster with 3 tenants, it will include:
```
1
1-2
1-3
```
The scan is now from the beginning of `1-` to its end, ensuring only
Node 1's tenants are included and not Node 10 or 100 or 1000.

Fixes: cockroachdb#99486

Release note: None
@aadityasondhi aadityasondhi requested review from a team and dhartunian March 24, 2023 19:40
@aadityasondhi aadityasondhi requested a review from a team as a code owner March 24, 2023 19:40
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aadityasondhi aadityasondhi added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Mar 24, 2023
Copy link
Copy Markdown
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi)

@aadityasondhi
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 24, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 24, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 25, 2023

Build succeeded:

@craig craig bot merged commit 0d84ad5 into cockroachdb:master Mar 25, 2023
@aadityasondhi aadityasondhi deleted the 20230324.fix-tsdb-scan-over-tenants branch March 31, 2023 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tsdb: metrics graphs reporting abnormally high values

3 participants