[FEATURE] In compilation server clean memory after every compilation by vadiklyutiy · Pull Request #490 · hidet-org/hidet

vadiklyutiy · 2025-01-31T16:35:29Z

In compilation server clean memory after every compilation

Consuming memory after compilation might be around 2GB. It is not used but just allocated memory. Free it.

vadiklyutiy · 2025-02-11T02:00:56Z

@yaoyaoding
kindly remind about this PR

yaoyaoding

…490) In compilation server clean memory after every compilation Consuming memory after compilation might be around 2GB. It is not used but just allocated memory. Free it. (cherry picked from commit ddc886682101c7368b98979285c4a2494025f20e)

…c_run in optimize (hidet-org#490) ### PR Comment - Symbolic Execution in `get_flow_graph` To optimize the generation of the flow graph in the `torch.compile` frontend with Hidet, we switched to using **symbolic execution** in the `get_flow_graph` function. This change avoids compiling the entire intermediate function, which is not used in the final fused function, resulting in significant time savings during this phase of compilation. --- ### PR Comment - Optimization for the `optimize()` Function in `torch.compile` Frontend The `optimize()` function in the `torch.compile` frontend was spending a significant amount of time compiling intermediate results. By incorporating PyTorch into the computation of intermediate results and leveraging their JIT compilation, we eliminated the need to compile our own code for these results, thereby improving the overall compilation time. This optimization resulted in substantial reductions in both **compilation** and **inference** times, **without increasing inference time**. #### Performance Results for ResNet50 (over 5 runs): | Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean | |---------------------------------|------------|------------|------------|------------|------------|----------------| | **without_eager_total_time** | 105.19 sec | 103.30 sec | 102.66 sec | 102.51 sec | 103.83 sec | **103.49 sec** | | **without_eager_inference_time**| 17.18 sec | 16.12 sec | 16.04 sec | 16.12 sec | 16.08 sec | **16.30 ms** | | **with_eager_total_time** | 22.90 sec | 22.89 sec | 22.55 sec | 22.58 sec | 22.61 sec | **22.71 sec** | | **with_eager_inference_time** | 15.98 sec | 15.97 sec | 16.06 sec | 16.06 sec | 16.09 sec | **16.03 ms** | --- #### Updated Performance Results for DenseNet121 (over 5 runs): | Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean | |---------------------------------|------------|------------|------------|------------|------------|----------------| | **without_eager_total_time** | 331.08 sec | 332.79 sec | 331.97 sec | 326.15 sec | 325.28 sec | **329.44 sec** | | **without_eager_inference_time**| 21.01 sec | 22.06 sec | 20.84 sec | 20.84 sec | 20.85 sec | **21.12 ms** | | **with_eager_total_time** | 66.49 sec | 66.95 sec | 66.68 sec | 66.70 sec | 66.66 sec | **66.70 sec** | | **with_eager_inference_time** | 20.83 sec | 20.90 sec | 20.90 sec | 20.88 sec | 20.91 sec | **20.88 ms** | --- #### Performance Results for ResNet50 - Search Space 2 (over 5 runs): | Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean | |---------------------------------|------------|------------|------------|------------|------------|----------------| | **without_eager_total_time** | 433.88 sec | 434.40 sec | 438.50 sec | 431.90 sec | 432.87 sec | **434.30 sec** | | **without_eager_inference_time**| 11.05 sec | 11.02 sec | 11.01 sec | 11.04 sec | 11.00 sec | **11.03 ms** | | **with_eager_total_time** | 355.77 sec | 356.84 sec | 355.12 sec | 354.91 sec | 355.45 sec | **355.62 sec** | | **with_eager_inference_time** | 10.97 sec | 10.95 sec | 10.93 sec | 10.96 sec | 10.98 sec | **10.96 ms** | --- #### Performance Results for DenseNet121 - Search Space 2 (over 5 runs): | Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean | |---------------------------------|------------|------------|------------|------------|------------|----------------| | **without_eager_total_time** | 1132.38 sec| 1123.16 sec| 1113.10 sec| 1115.51 sec| 1117.07 sec| **1120.22 sec** | | **without_eager_inference_time**| 14.96 sec | 14.95 sec | 14.95 sec | 14.94 sec | 15.01 sec | **14.96 ms** | | **with_eager_total_time** | 824.45 sec | 833.04 sec | 827.02 sec | 832.37 sec | 819.60 sec | **827.28 sec** | | **with_eager_inference_time** | 14.95 sec | 14.98 sec | 14.95 sec | 14.95 sec | 14.97 sec | **14.96 ms** | --- #### Analysis: - **ResNet50 Total Time**: With `--eager`, the total time is reduced from **103.49 seconds** to **22.71 seconds** in the default search space, and from **434.30 seconds** to **355.62 seconds** in search space 2. This is an improvement of **~78.1%** in the default space and **~18.1%** in search space 2. - **ResNet50 Inference Time**: Inference time is slightly reduced in both cases. For the default search space, it improves from **16.30 ms** to **16.03 ms**, and for search space 2, it improves from **11.03 ms** to **10.96 ms**. - **DenseNet121 Total Time**: With `--eager`, the total time is reduced from **329.44 seconds** to **66.70 seconds** in the default search space, and from **1120.22 seconds** to **827.28 seconds** in search space 2. This is an improvement of **~79.8%** in the default space and **~26.2%** in search space 2. - **DenseNet121 Inference Time**: Inference time remains consistent in both cases. For the default search space, it is **20.88 ms**, and for search space 2, it improves slightly from **14.96 ms** to **14.96 ms**. --- #### Measurement Details: - **Model used for measurement**: `resnet50` and `densenet121`, with batch size of 128 and input shape `128x3x224x224`. - **Data type**: `float16`. - **Backend**: `hidet`. - **Mode**: `default`. - **Search space**: `0` for default results, and `2` for the second set of results for ResNet50 and DenseNet121. --------- Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

vadiklyutiy added the enhancement New feature or request label Jan 31, 2025

vadiklyutiy requested review from BolinSNLHM, tatianashp and xiaocenxiaocen January 31, 2025 16:35

vadiklyutiy self-assigned this Jan 31, 2025

vadiklyutiy requested a review from yaoyaoding January 31, 2025 16:35

vadiklyutiy added 2 commits February 1, 2025 00:15

wip

44313d8

wip

2c0262e

vadiklyutiy force-pushed the vadim/compserver-gc branch from e57a44c to 2c0262e Compare January 31, 2025 20:16

yaoyaoding approved these changes Feb 11, 2025

View reviewed changes

vadiklyutiy merged commit f6fb9ea into main Feb 11, 2025

vadiklyutiy deleted the vadim/compserver-gc branch February 11, 2025 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] In compilation server clean memory after every compilation #490

[FEATURE] In compilation server clean memory after every compilation #490
vadiklyutiy merged 2 commits intomainfrom
vadim/compserver-gc

vadiklyutiy commented Jan 31, 2025

Uh oh!

vadiklyutiy commented Feb 11, 2025

Uh oh!

yaoyaoding left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vadiklyutiy commented Jan 31, 2025

Uh oh!

vadiklyutiy commented Feb 11, 2025

Uh oh!

yaoyaoding left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants