Skip to content

config: Remove validation GetScrapeConfigs to avoid race and writes.#15634

Closed
bwplotka wants to merge 1 commit intomainfrom
fix15538
Closed

config: Remove validation GetScrapeConfigs to avoid race and writes.#15634
bwplotka wants to merge 1 commit intomainfrom
fix15538

Conversation

@bwplotka
Copy link
Member

We should never modify (or even shallow copy) Config after config.Load; added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently in tests and ApplyConfig causing test races. In prod there were races but likelyt only to replace 0 with 0, so not too severe.

I remove validation since I don't see anyone using our config.Config without Load or UnmarshalYAML.

We can add another validation method if there are use cases for this.

Fixes #15538

Alternatives:

  • Add config mutex for GetScrapeConfigs (more complex and slow)
  • Copy Config (Config struct is huge and required complex deep copy)
  • Second validate is noop
    • e.g. by checking if global is 0 (prone to errors in future)
    • e.g. once.Sync validation - this introduces state and mutex, let's avoid it until we know it's needed (multiple ways of loading config)


// SetDirectory joins any relative file paths with dir.
func (c *Config) SetDirectory(dir string) {
// setDirectory joins any relative file paths with dir.
Copy link
Member Author

@bwplotka bwplotka Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed so it's clear no one should use it outside of load (it writes).

If there will be users who need it outside of Prometheus, we can make it public AND looks on how things are used to ensure callers validate properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the rest of the SetDirectory functions are still capitalized, which may seem inconsistent. The Set clearly indicates that it modifies things. Maybe we could just clarify this further with a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, agree, I don't think we can do that (force all to treat config in read-only mode).

for i, scfg := range c.ScrapeConfigs {
// We do these checks for library users that would not call validate in
// Unmarshal.
if err := scfg.Validate(c.GlobalConfig); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing race (e.g. this writes to scrapeConfig.TargetLimit = global and the same scrapeConfig.TargetLimit is read on scrape loop)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @roidelapluie as you added this, in case you rely on it.

We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were races but likelyt only to replace 0 with 0, so not too severe.

I remove validation since I don't see anyone using our config.Config without Load or UnmarshalYAML.

We can add another validation method if there are use cases for this.

Fixes #15538

Alternatives:
* Add config mutex for GetScrapeConfigs (more complex and slow)
* Copy Config (Config struct is huge and required complex deep copy)
* Second validate is noop
  * e.g. by checking if global is 0 (prone to errors in future)
  * e.g. once.Sync validation - this introduces state and mutex, let's avoid it until we know it's needed (multiple ways of loading config)


Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

OK, so test failures confirms another pattern, callers crafting their own cfg *config.Config and then calling GetScrapeConfig.. we might want to add extra helpers to validate things...

Copy link
Member

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it's indeed cleaner and clearer.


// SetDirectory joins any relative file paths with dir.
func (c *Config) SetDirectory(dir string) {
// setDirectory joins any relative file paths with dir.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the rest of the SetDirectory functions are still capitalized, which may seem inconsistent. The Set clearly indicates that it modifies things. Maybe we could just clarify this further with a comment?

for i, scfg := range c.ScrapeConfigs {
// We do these checks for library users that would not call validate in
// Unmarshal.
if err := scfg.Validate(c.GlobalConfig); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @roidelapluie as you added this, in case you rely on it.

@bwplotka
Copy link
Member Author

Ok, I am changing the approach to mutex/sync.Once validation on GetScrapeConfigs, the easiest for now.

@bwplotka bwplotka closed this Dec 11, 2024
bwplotka added a commit that referenced this pull request Dec 11, 2024
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.

I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.


Fixes #15538
Previous attempt: #15634

Signed-off-by: bwplotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this pull request Dec 11, 2024
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.

I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.

Fixes #15538
Previous attempt: #15634

Signed-off-by: bwplotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this pull request Dec 11, 2024
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.

I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.

Fixes #15538
Previous attempt: #15634

Signed-off-by: bwplotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data race in TestOnlyProviderStaleTargetsAreDropped

2 participants