empty gpu memory cache between different benchmark cases (#2242) by liqiangxl · Pull Request #2243 · csarofeen/pytorch

liqiangxl · 2022-12-06T14:03:43Z

Fixes #2242
(1) Reason: PyTorch uses a caching memory allocator to speed up memory allocations. Testing multi-cases in a single run lead to less avilable gpu memroy for the last case and more allocated gpu memory pieces in the memory pool. It may take longer to find the appropriate piece of memory.
(2) Fix: clear memory pool.
(3) Results: after fix the performance difference reduced from 13% to 1%.
Before fix: run 7 cases the performance of last case is: 1.042 TB/s (1067/1024)
After fix: run 7 cases the performance of last case is: 1.190 TB/s
run only the last case: 1.178 TB/s

csarofeen · 2022-12-06T14:25:02Z

I was just having a conversation with someone recently about how we profile results, specifically if we're concerned about how performance can change under the full workload versus isolated benchmarks. Might have been with @drzejan2 this is a pretty interesting example where even the allocator can interfere. The caching allocator has worked really well for us for a long time, but might be time for us to start placing allocations more carefully, in real workloads.

I'm fine with the change, just an interesting concrete example of concerns that are often hand-wavy.

naoyam · 2022-12-06T21:48:12Z

That's a nice catch, @liqiangxl! I thought we didn't include memory allocations in the measurement.

naoyam

LGTM

drzejan2 · 2022-12-07T10:50:49Z

@csarofeen thanks for letting me know about this fix.
I haven't moved to the benchmark infrastructure in the project but I plan to start moving in that direction really soon - for the time being I will rely on simple sample and CUDA events from its execution.

empty gpu memory cache between different benchmark cases (#2242)

d9fb789

liqiangxl requested a review from naoyam December 6, 2022 14:03

naoyam approved these changes Dec 6, 2022

View reviewed changes

liqiangxl merged commit 673d40c into devel Dec 6, 2022

liqiangxl deleted the llu/benckmark_variance branch December 6, 2022 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

empty gpu memory cache between different benchmark cases (#2242)#2243

empty gpu memory cache between different benchmark cases (#2242)#2243
liqiangxl merged 1 commit intodevelfrom
llu/benckmark_variance

liqiangxl commented Dec 6, 2022

Uh oh!

csarofeen commented Dec 6, 2022

Uh oh!

naoyam commented Dec 6, 2022

Uh oh!

naoyam left a comment

Uh oh!

drzejan2 commented Dec 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

liqiangxl commented Dec 6, 2022

Uh oh!

csarofeen commented Dec 6, 2022

Uh oh!

naoyam commented Dec 6, 2022

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

drzejan2 commented Dec 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants