[Fix] training time validation bug fix by 0x4f5da2 · Pull Request #611 · open-mmlab/mmpretrain

0x4f5da2 · 2021-12-22T04:09:28Z

Motivation

This PR fixes the bug introduced in #588

Modification

add sampler_cfg for validation in mmcls/api/train.py

codecov · 2021-12-22T04:23:02Z

Codecov Report

Merging #611 (21fd957) into master (d5ae255) will increase coverage by 0.44%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #611      +/-   ##
==========================================
+ Coverage   81.33%   81.78%   +0.44%     
==========================================
  Files         116      116              
  Lines        6720     6709      -11     
  Branches     1156     1151       -5     
==========================================
+ Hits         5466     5487      +21     
+ Misses       1103     1069      -34     
- Partials      151      153       +2

Flag	Coverage Δ
unittests	`81.78% <100.00%> (+0.44%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcls/apis/train.py	`21.25% <ø> (-6.53%)`	⬇️
mmcls/datasets/builder.py	`73.21% <100.00%> (+27.06%)`	⬆️
mmcls/datasets/samplers/repeat_aug.py	`90.47% <100.00%> (+5.36%)`	⬆️
mmcls/datasets/samplers/distributed_sampler.py	`84.00% <0.00%> (+56.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5ae255...21fd957. Read the comment docs.

mzr1996 · 2021-12-22T06:31:11Z

Please also add it in tools/test.py

0x4f5da2 · 2021-12-22T07:30:51Z

Please also add it in tools/test

fixed

RangiLyu · 2021-12-23T02:36:15Z

mmcls/datasets/builder.py

    """
    rank, world_size = get_dist_info()

+    sampler_cfg = get_sampler_cfg(cfg, distributed=dist)


It is not a good choice to pass the whole config into a module's builder.

What if we change like this?

rank, world_size = get_dist_info() # default logic if dist: sampler = DistributedSampler( dataset, world_size, rank, shuffle=shuffle, round_up=round_up) batch_size = samples_per_gpu num_workers = workers_per_gpu else: sampler = None batch_size = num_gpus * samples_per_gpu num_workers = num_gpus * workers_per_gpu # build custom sampler logic if sampler_cfg: # shuffle=False when val and test sampler_cfg.update(shuffle=shuffle) sampler = build_sampler( sampler_cfg, default_args=dict(dataset=dataset, num_replicas=world_size, rank=rank)) # If sampler exists, turn off dataloader shuffle if sampler is not None: shuffle = False

I think it's better to use:

rank, world_size = get_dist_info() # Custom sampler logic if sampler_cfg: # shuffle=False when val and test sampler_cfg.update(shuffle=shuffle) sampler = build_sampler( sampler_cfg, default_args=dict(dataset=dataset, num_replicas=world_size, rank=rank)) # Default sampler logic elif dist: sampler = DistributedSampler( dataset, world_size, rank, shuffle=shuffle, round_up=round_up) else: sampler = None # If sampler exists, turn off dataloader shuffle if sampler is not None: shuffle = False if dist: batch_size = samples_per_gpu num_workers = workers_per_gpu else: batch_size = num_gpus * samples_per_gpu num_workers = num_gpus * workers_per_gpu

Then we can avoid an unnecessary instantiation of DistributedSampler. And split the logic of sampler and batch_size.

RangiLyu · 2021-12-23T02:41:46Z

mmcls/datasets/builder.py

 SAMPLERS = Registry('sampler')


+def get_sampler_cfg(cfg, distributed):


MME need to handle DistributedSampler and InfiniteSampler as default sampler when runner type is different, so there is a set_default_sampler_cfg function.
But MMClassification only has one default sampler: DistributedSampler. I think this function is unnecessary here.

and world size.

mzr1996

LGTM

* sampler bugfixes * sampler bugfixes * reorganize code * minor fixes * reorganize code * minor fixes * Use `mmcv.runner.get_dist_info` instead of `dist` package to get rank and world size. * Add `build_dataloader` unit tests and fix sampler's unit tests. * Fix unit tests * Fix unit tests Co-authored-by: mzr1996 <mzr1996@163.com>

sampler bugfixes

96248de

0x4f5da2 requested a review from mzr1996 December 22, 2021 04:51

sampler bugfixes

52f9506

0x4f5da2 added 2 commits December 22, 2021 20:14

reorganize code

c52a62b

minor fixes

79fd69d

mzr1996 requested a review from RangiLyu December 23, 2021 01:39

RangiLyu reviewed Dec 23, 2021

View reviewed changes

0x4f5da2 and others added 6 commits December 23, 2021 12:36

reorganize code

7f66825

minor fixes

4e3f6cd

Use mmcv.runner.get_dist_info instead of dist package to get rank

546d6b3

and world size.

Add build_dataloader unit tests and fix sampler's unit tests.

20a5f8f

Fix unit tests

09fd36f

Fix unit tests

21fd957

mzr1996 approved these changes Dec 23, 2021

View reviewed changes

mzr1996 merged commit da39ca6 into open-mmlab:master Dec 23, 2021

HIT-cwh mentioned this pull request Mar 27, 2022

[Fix]Sync train api open-mmlab/mmrazor#115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] training time validation bug fix#611

[Fix] training time validation bug fix#611
mzr1996 merged 10 commits intoopen-mmlab:masterfrom
0x4f5da2:fix_sampler

0x4f5da2 commented Dec 22, 2021

Uh oh!

codecov bot commented Dec 22, 2021 •

edited

Loading

Uh oh!

mzr1996 commented Dec 22, 2021

Uh oh!

0x4f5da2 commented Dec 22, 2021

Uh oh!

RangiLyu Dec 23, 2021

Uh oh!

mzr1996 Dec 23, 2021 •

edited

Loading

Uh oh!

RangiLyu Dec 23, 2021

Uh oh!

mzr1996 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		SAMPLERS = Registry('sampler')


		def get_sampler_cfg(cfg, distributed):

Conversation

0x4f5da2 commented Dec 22, 2021

Motivation

Modification

Uh oh!

codecov bot commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mzr1996 commented Dec 22, 2021

Uh oh!

0x4f5da2 commented Dec 22, 2021

Uh oh!

RangiLyu Dec 23, 2021

Choose a reason for hiding this comment

Uh oh!

mzr1996 Dec 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RangiLyu Dec 23, 2021

Choose a reason for hiding this comment

Uh oh!

mzr1996 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 22, 2021 •

edited

Loading

mzr1996 Dec 23, 2021 •

edited

Loading