fix!: Rename MaxGoRoutines and enforce it#103
fix!: Rename MaxGoRoutines and enforce it#103erezrokah wants to merge 3 commits intocloudquery:mainfrom
Conversation
| } | ||
| } | ||
|
|
||
| totalResources++ |
There was a problem hiding this comment.
It seems more clear to me to count each resource at the time we send it to the channel instead of the current approach. Happy to revert as this is a personal opinion
|
I'd like to give it a go as well as I was dealing with a lot of issues around our concurrency and locks. I commented some of the stuff out because of lack of time and because I encountered some race conditions which I didn't have time to fix at the time. |
💯 I understand as this is a core change. I think the challenge is that we can't predict how many
|
I think this still wont solve it and will always cause deadlock. The only way I see is possible to solve it is with X parameters:
|
Yeah I thought about that too, but that's a dynamic number based on the depth of relations and also very hard to expose as a config.
I think if you resolve only 1 level each time and each relation is a new job it won't deadlock as the parent already finished running (if we do a non blocking send or use a very big channel). But again that leads to memory issues as the number of jobs grows very high. Hence this PR only implements a limit at the top level tables and avoids the term Go routines. On another note Azure has 55 top level tables, and if someone has 4 subscriptions the current code will create 220 Go routines. I'm actually not worried about the number of Go routines, but on the number of API calls and rate limiting. |
|
|
||
| totalResources := 0 | ||
| for i := 0; i < jobsCount; i++ { | ||
| result := <-results |
There was a problem hiding this comment.
I would make the channels unbuffered and collect in a separate goroutine, to save on memory.
There was a problem hiding this comment.
I'll make this change once @yevgenypats approves of the new approach
| Destinations []string `json:"destinations,omitempty"` | ||
| Spec interface{} `json:"spec,omitempty"` | ||
| Registry Registry `json:"registry,omitempty"` | ||
| Concurrency uint64 `json:"concurrency,omitempty"` |
There was a problem hiding this comment.
I welcome this change as the MaxGoRoutines name is a bit too implementation-detaily.
81d8f02 to
9cf9fa0
Compare
|
Closing in favor of #129 |
Summary
When testing the Azure plugin I noticed we print a message on the number of Go routines used. From the other logs it seemed that the number is not enforced (All table fetch start logs were printed at the same time).
Since we can't really enforce a number of Go routines as sometimes we'd like to fetch related resources in parallel and we don't know those in advance, I renamed the configuration to
Concurrencyas I think we should still have a setting to control the load CloudQuery puts on the APIs (right now it creates a Go routine per top level table).Also I switched to using channels instead of semaphore as seems more suitable for this use case, and we use channels in other places in the code.
Use the following steps to ensure your PR is ready to be reviewed
go fmtto format your code 🖊golangci-lint run🚨 (install golangci-lint here)