Skip to content

fix: Bring concurrency back#129

Merged
yevgenypats merged 8 commits intomainfrom
fix/concurrency
Sep 19, 2022
Merged

fix: Bring concurrency back#129
yevgenypats merged 8 commits intomainfrom
fix/concurrency

Conversation

@yevgenypats
Copy link
Copy Markdown
Contributor

@yevgenypats yevgenypats commented Sep 19, 2022

We had concurrency limits disabled. Fixed it and also fixed the issue that we had with timings (now the time really shows how long it took for a goroutine to fetch without wait time).

This should work instead of #103


Use the following steps to ensure your PR is ready to be reviewed

  • Read the contribution guidelines 🧑‍🎓
  • Run go fmt to format your code 🖊
  • Lint your changes via golangci-lint run 🚨 (install golangci-lint here)
  • Update or add tests 🧪
  • Ensure the status checks below are successful ✅

Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better 🚀 Added a few comments

@@ -127,24 +132,22 @@ func (p *SourcePlugin) Sync(ctx context.Context, spec specs.Source, res chan<- *

// limiter used to limit the amount of resources fetched concurrently
maxGoroutines := spec.MaxGoRoutines
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing MaxGoRoutines to Concurrency? This will give us some flexibility in how we implement it. For example we might want to add additional parallelism for specific resources/relations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const minGoRoutines = 5
const (
minGoRoutines = 5
defaultMaxGoRoutines = 500000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMaxGoRoutines = 500000
defaultMaxGoRoutines = 500000

Does this mean that by default we spin up 500000 go routines? Also what's the reason for having a hard min? Shouldn't we have only GoRoutinesCount and ensure it's not < 0 (0 meaning default value)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess you are right - it was from older piece of code where we had an issue with 1-4 go routines.

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a comment on the default var name.

Also this is also a breaking change so better to merge as fix!:.

Finally I think the current default means "unlimited" at the moment. Not sure that's the best default as we might hit rate limits quite fast

yevgenypats and others added 2 commits September 19, 2022 14:56
Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
@yevgenypats
Copy link
Copy Markdown
Contributor Author

Approved with a comment on the default var name.

Also this is also a breaking change so better to merge as fix!:.

Finally I think the current default means "unlimited" at the moment. Not sure that's the best default as we might hit rate limits quite fast

Thanks! The default means 500K which is I think reasonable for 1 CPU 4GB RAM which should be pretty much anyone

@erezrokah
Copy link
Copy Markdown
Member

The default means 500K which is I think reasonable for 1 CPU 4GB RAM which should be pretty much anyone

I should have explained it better. I'm not worried about the machine specs, but being throttled/rate limited by the service API

@yevgenypats
Copy link
Copy Markdown
Contributor Author

The default means 500K which is I think reasonable for 1 CPU 4GB RAM which should be pretty much anyone

I should have explained it better. I'm not worried about the machine specs, but being throttled/rate limited by the service API

I think concurrency is the right parameter are mainly to deal with machine spec and too many open files and so on. and max_back_off, max_retries are usually the right params to deal with throttle on the server side

@yevgenypats yevgenypats merged commit 04c7f49 into main Sep 19, 2022
@yevgenypats yevgenypats deleted the fix/concurrency branch September 19, 2022 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants