Skip to content

Conversation

@erezrokah
Copy link
Member

@erezrokah erezrokah commented Jun 24, 2024

Summary

Instead of #18360, below are debug logs I've added (in the S3 SDK) to measure the over allocation when uploading batches:

Allocated 5MB, needed 3.66MB 
Allocated 5MB, needed 616.00B 
Allocated 5MB, needed 1.11KB 
Allocated 5MB, needed 21.57KB 
Allocated 5MB, needed 849.00B 
Allocated 5MB, needed 1.09KB 
Allocated 5MB, needed 1.89KB 
Allocated 5MB, needed 4.81KB 
Allocated 5MB, needed 4.22KB 
Allocated 5MB, needed 16.42KB 
Allocated 5MB, needed 54.46KB 
Allocated 5MB, needed 3.98MB 
Allocated 5MB, needed 3.83KB 
Allocated 5MB, needed 302.00B 
Allocated 5MB, needed 1.39KB 
Allocated 5MB, needed 596.00B 
Allocated 5MB, needed 537.00B 
Allocated 5MB, needed 793.00B 
Allocated 5MB, needed 33.50KB 
Allocated 5MB, needed 392.00B 
Allocated 5MB, needed 310.00B 
Allocated 5MB, needed 721.00B 
Allocated 5MB, needed 2.04KB 
Allocated 5MB, needed 1.51KB 
Allocated 5MB, needed 64.49KB 
Allocated 5MB, needed 17.37KB 
Allocated 5MB, needed 1.92KB 
Allocated 5MB, needed 1.02KB 
Allocated 5MB, needed 25.58KB 
Allocated 5MB, needed 1.85MB 
Allocated 5MB, needed 16.24KB 
Allocated 5MB, needed 9.47KB 
Allocated 5MB, needed 1.21KB 
Allocated 5MB, needed 1.65KB 
Allocated 5MB, needed 1.40KB 
Allocated 5MB, needed 1.20KB 
Allocated 5MB, needed 926.00B 
Allocated 5MB, needed 5.98KB 
Allocated 5MB, needed 5.18KB 
Allocated 5MB, needed 10.39KB 
Allocated 5MB, needed 9.20KB 

Copy link
Member

@disq disq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good explanation 🕵🏼‍♂️

@marianogappa
Copy link
Contributor

AWS -> S3 without this fix (aws 50k concurrency, S3 has no concurrency limit)
without erez fix

AWS -> S3 WITH this fix (aws 50k concurrency, S3 has no concurrency limit)
with erez fix and aws 50k concurrency

AWS -> S3 WITH this fix (aws 100k concurrency, S3 has no concurrency limit)
with erez fix and aws 100k concurrency

@erezrokah erezrokah added the automerge Automatically merge once required checks pass label Jun 24, 2024
@kodiakhq kodiakhq bot merged commit 5916c95 into main Jun 24, 2024
@kodiakhq kodiakhq bot deleted the fix/memory_s3 branch June 24, 2024 10:54
Comment on lines +35 to +39
allData, err := io.ReadAll(r)
if err != nil {
return err
}
readerSeeker := bytes.NewReader(allData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use bytes.Buffer + io.Copy instead (might save a bit of cycles)

kodiakhq bot pushed a commit that referenced this pull request Jun 24, 2024
🤖 I have created a release *beep* *boop*
---


## [7.2.4](plugins-destination-s3-v7.2.3...plugins-destination-s3-v7.2.4) (2024-06-24)


### Bug Fixes

* **deps:** Update aws-sdk-go-v2 monorepo ([#18356](#18356)) ([b22a81d](b22a81d))
* **deps:** Update module github.com/cloudquery/filetypes/v4 to v4.2.22 ([#18349](#18349)) ([9f104f1](9f104f1))
* **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.46.0 ([#18341](#18341)) ([5db9574](5db9574))
* **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.46.1 ([#18350](#18350)) ([8ff8909](8ff8909))
* **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.47.0 ([#18351](#18351)) ([9c5bbdc](9c5bbdc))
* **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.47.1 ([#18352](#18352)) ([b31812a](b31812a))
* Pass a reader with seeking support to S3 SDK upload method ([#18361](#18361)) ([5916c95](5916c95))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
kodiakhq bot pushed a commit that referenced this pull request Jul 18, 2024
…#18608)

#### Summary

Follow up to #18361.
When `no_rotate: true` we can get large files to upload and when that happens we'll be holding the whole file in memory and eventually running out of it.
An example is a sync on many AWS accounts (for example 2000), even for a single table. Let's say the table data is 10MB per account, we end up with a file of 20GB.

The downside of this change is that if a sync has a single account but many tables (and `no_rotate: true`), we'll run into the same issue #18361 fixed.

There are workarounds for both issues (either reducing the number of accounts in the sync, or reducing the concurrency).

The best solution would be to get aws/aws-sdk-go-v2#2694 fixed (suggested solution in https://github.com/aws/aws-sdk-go-v2/compare/main...erezrokah:fix/dont_over_allocate_small_files?expand=1), but doesn't seems like it's going to happen soon.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/plugin/destination/s3 automerge Automatically merge once required checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants