Skip to content

Refactor get_XXX_dataloader from Trainer#38090

Merged
SunMarc merged 3 commits into
huggingface:mainfrom
yaswanth19:exclude-test-dataloader
May 19, 2025
Merged

Refactor get_XXX_dataloader from Trainer#38090
SunMarc merged 3 commits into
huggingface:mainfrom
yaswanth19:exclude-test-dataloader

Conversation

@yaswanth19

@yaswanth19 yaswanth19 commented May 12, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

Refactors dataloader functions in Trainer

@github-actions github-actions Bot marked this pull request as draft May 12, 2025 17:42
@github-actions

Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@yaswanth19 yaswanth19 marked this pull request as ready for review May 12, 2025 17:44
@Rocketknight1

Copy link
Copy Markdown
Member

cc @SunMarc!

@SunMarc SunMarc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposition @yaswanth19, since this is something that was here for a long time, removing this can actually be breaking for some code that doesn't process the train / test / eval dataset the same way. We can definitely make it easier to maintain by defining some utility functions but as for removing these methods, I prefer not to do that. Also get_eval_dataloader is bit different from get_test_dataloader

@yaswanth19

Copy link
Copy Markdown
Contributor Author

since this is something that was here for a long time, removing this can actually be breaking for some code that doesn't process the train / test / eval dataset the same way. We can definitely make it easier to maintain by defining some utility functions but as for removing these methods, I prefer not to do that. Also get_eval_dataloader is bit different from get_test_dataloader

Thanks @SunMarc , I do agree that this might be breaking change for some other codebases.

I don't see a scenario where I expect my eval_dataloader and test_dataloader to behave different and the core logic seems to be same to me 😅 . Probably we can keep get_test_dataloader but use get_eval_dataloader inside to simplify th redundant logic. WDYT?

@SunMarc

SunMarc commented May 13, 2025

Copy link
Copy Markdown
Member

In any case, we will keep both functions. I don't think this make sense to use get_eval_dataloader inside get_test_dataloader, maybe you can create a _get_dataloader or _process_dataloader function that you use in both method to reduce the redundant parts.

@yaswanth19 yaswanth19 changed the title Remove get_test_dataloader from Trainer Refactor get_XXX_dataloader from Trainer May 13, 2025
@yaswanth19

Copy link
Copy Markdown
Contributor Author

@SunMarc Please review 🤗 . I haven't added any test as IMO those should be covered when we test get_XX_dataloader.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this refactor ! This is much better !

@SunMarc SunMarc merged commit 6bb6821 into huggingface:main May 19, 2025
20 checks passed
xvyv99 pushed a commit to xvyv99/transformers that referenced this pull request May 19, 2025
faaany pushed a commit to faaany/transformers that referenced this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants