Skip to content

fix!: Rename MaxGoRoutines and enforce it#103

Closed
erezrokah wants to merge 3 commits intocloudquery:mainfrom
erezrokah:fix/goroutines
Closed

fix!: Rename MaxGoRoutines and enforce it#103
erezrokah wants to merge 3 commits intocloudquery:mainfrom
erezrokah:fix/goroutines

Conversation

@erezrokah
Copy link
Copy Markdown
Member

@erezrokah erezrokah commented Sep 14, 2022

Summary

When testing the Azure plugin I noticed we print a message on the number of Go routines used. From the other logs it seemed that the number is not enforced (All table fetch start logs were printed at the same time).
Since we can't really enforce a number of Go routines as sometimes we'd like to fetch related resources in parallel and we don't know those in advance, I renamed the configuration to Concurrency as I think we should still have a setting to control the load CloudQuery puts on the APIs (right now it creates a Go routine per top level table).
Also I switched to using channels instead of semaphore as seems more suitable for this use case, and we use channels in other places in the code.


Use the following steps to ensure your PR is ready to be reviewed

  • Read the contribution guidelines 🧑‍🎓
  • Run go fmt to format your code 🖊
  • Lint your changes via golangci-lint run 🚨 (install golangci-lint here)
  • Update or add tests 🧪
  • Ensure the status checks below are successful ✅

}
}

totalResources++
Copy link
Copy Markdown
Member Author

@erezrokah erezrokah Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more clear to me to count each resource at the time we send it to the channel instead of the current approach. Happy to revert as this is a personal opinion

@github-actions github-actions bot added breaking and removed fix labels Sep 14, 2022
@yevgenypats
Copy link
Copy Markdown
Contributor

I'd like to give it a go as well as I was dealing with a lot of issues around our concurrency and locks. I commented some of the stuff out because of lack of time and because I encountered some race conditions which I didn't have time to fix at the time.

@github-actions github-actions bot added breaking and removed fix labels Sep 15, 2022
@erezrokah
Copy link
Copy Markdown
Member Author

erezrokah commented Sep 15, 2022

I'd like to give it a go as well as I was dealing with a lot of issues around our concurrency and locks. I commented some of the stuff out because of lack of time and because I encountered some race conditions which I didn't have time to fix at the time.

💯 I understand as this is a core change. I think the challenge is that we can't predict how many table.resolve jobs we have as when we have a relation we resolve it for each parent resource that was fetched. Hence we can only enforce the number of Go routines at the top level.
The only way I could think of of enforcing the max Go routines is instead of resolving the relations in the same Go routine as the parent, send it as a new job back to the worker channel/queue. The problem with that approach is that we need to keep in memory all the relation jobs until those are scheduled. For example if a top level table fetches 10,000 items and it has 4 relations, we'll queue 40,000 new jobs.

I deleted the comments as code in comments usually gets stale and leads to future confusion.

@yevgenypats
Copy link
Copy Markdown
Contributor

I'd like to give it a go as well as I was dealing with a lot of issues around our concurrency and locks. I commented some of the stuff out because of lack of time and because I encountered some race conditions which I didn't have time to fix at the time.

💯 I understand as this is a core change. I think the challenge is that we can't predict how many table.resolve jobs we have as when we have a relation we resolve it for each parent resource that was fetched. Hence we can only enforce the number of Go routines at the top level. The only way I could think of of enforcing the max Go routines is instead of resolving the relations in the same Go routine as the parent, send it as a new job back to the worker channel/queue. The problem with that approach is that we need to keep in memory all the relation jobs until those are scheduled. For example if a top level table fetches 10,000 items and it has 4 relations, we'll queue 40,000 new jobs.

I deleted the comments as code in comments usually gets stale and leads to future confusion.

I think this still wont solve it and will always cause deadlock. The only way I see is possible to solve it is with X parameters:

concurrency_relations_1, concurrency_relations_2, and so on. a limit on number of jobs wont solve as child relations can always cause deadlock for parent, doesn't matter if it's queue, worker channel, etc...

@erezrokah
Copy link
Copy Markdown
Member Author

concurrency_relations_1, concurrency_relations_2

Yeah I thought about that too, but that's a dynamic number based on the depth of relations and also very hard to expose as a config.

a limit on number of jobs wont solve as child relations can always cause deadlock for parent, doesn't matter if it's queue, worker channel, etc...

I think if you resolve only 1 level each time and each relation is a new job it won't deadlock as the parent already finished running (if we do a non blocking send or use a very big channel). But again that leads to memory issues as the number of jobs grows very high.

Hence this PR only implements a limit at the top level tables and avoids the term Go routines.
If we don't have a limit at all the only way to control concurrency is to use multiple configuration files with a different set of resources per configuration.

On another note Azure has 55 top level tables, and if someone has 4 subscriptions the current code will create 220 Go routines. I'm actually not worried about the number of Go routines, but on the number of API calls and rate limiting.
⬆️ Might not be a problem yet, but it would be nice to be able to control it.


totalResources := 0
for i := 0; i < jobsCount; i++ {
result := <-results
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the channels unbuffered and collect in a separate goroutine, to save on memory.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this change once @yevgenypats approves of the new approach

Destinations []string `json:"destinations,omitempty"`
Spec interface{} `json:"spec,omitempty"`
Registry Registry `json:"registry,omitempty"`
Concurrency uint64 `json:"concurrency,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome this change as the MaxGoRoutines name is a bit too implementation-detaily.

@erezrokah
Copy link
Copy Markdown
Member Author

Closing in favor of #129

@erezrokah erezrokah closed this Sep 19, 2022
@erezrokah erezrokah deleted the fix/goroutines branch September 19, 2022 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants