Skip to content

fix: excessive data uploads when calling run.save() repeatedly on unchanged files#10639

Merged
dmitryduev merged 16 commits intomainfrom
20251003-dedup-live-uploads
Oct 6, 2025
Merged

fix: excessive data uploads when calling run.save() repeatedly on unchanged files#10639
dmitryduev merged 16 commits intomainfrom
20251003-dedup-live-uploads

Conversation

@dmitryduev
Copy link
Copy Markdown
Member

@dmitryduev dmitryduev commented Oct 3, 2025

Description

Older versions (pre 0.18.0) of the SDK had deduplication logic for saving files absent in wandb-core: tracking file sizes and modification dates for successfully uploaded files to prevent re-upload. This PR adds similar logic to wandb-core that relies on the hashed content of the file (instead of size and mdate).

The script used for testing:
import pathlib
from time import sleep

import torch

import wandb

num_bytes = 1 * 1024 * 1024
num_floats = num_bytes // 4
num_steps = 50
save_interval = 5
num_files = 0

with wandb.init(project="file-sync-test") as run:
    for i in range(num_steps):
        if (i + 1) % save_interval == 0:
            num_files += 1
            print(pathlib.Path(__file__).parent / f"data_{i + 1}.save")
            torch.save(
                torch.rand(num_floats, dtype=torch.float32),
                f"{wandb.run.dir}/data_{i + 1}.save",
                # pathlib.Path(__file__).parent / f"data_{i + 1}.save",
            )
            print(f"{i + 1}/{num_steps} -> Saved file")
        else:
            print(f"{i + 1}/{num_steps}")
        run.save("*.save")  # sync any new files of interest
        sleep(2.0)


print(f"Total traffic should be: ~{num_files * num_bytes / 1024 / 1024} MB")
print(f"Total saved files: {num_files}")

See https://wandb.ai/trustmebro/file-sync-test/.

image
  • I updated CHANGELOG.unreleased.md, or it's not applicable

Testing

Added unit tests.

@dmitryduev dmitryduev requested a review from a team as a code owner October 3, 2025 20:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/runfiles_test.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
Comment thread core/internal/runfiles/saved_file.go Outdated
@dmitryduev dmitryduev changed the title fix: do not re-upload previously saved files if unchanged fix: excessive data uploads when calling run.save() repeatedly on unchanged files Oct 6, 2025
@dmitryduev dmitryduev enabled auto-merge (squash) October 6, 2025 17:59
@dmitryduev dmitryduev merged commit 818dba4 into main Oct 6, 2025
38 checks passed
@dmitryduev dmitryduev deleted the 20251003-dedup-live-uploads branch October 6, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants