Skip to content

[FEATURE] In compilation server clean memory after every compilation #490

Merged
vadiklyutiy merged 2 commits intomainfrom
vadim/compserver-gc
Feb 11, 2025
Merged

[FEATURE] In compilation server clean memory after every compilation #490
vadiklyutiy merged 2 commits intomainfrom
vadim/compserver-gc

Conversation

@vadiklyutiy
Copy link
Copy Markdown
Collaborator

In compilation server clean memory after every compilation

Consuming memory after compilation might be around 2GB. It is not used but just allocated memory. Free it.

@vadiklyutiy vadiklyutiy added the enhancement New feature or request label Jan 31, 2025
@vadiklyutiy vadiklyutiy self-assigned this Jan 31, 2025
@vadiklyutiy
Copy link
Copy Markdown
Collaborator Author

@yaoyaoding
kindly remind about this PR

Copy link
Copy Markdown
Member

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @vadiklyutiy !

@vadiklyutiy vadiklyutiy merged commit f6fb9ea into main Feb 11, 2025
@vadiklyutiy vadiklyutiy deleted the vadim/compserver-gc branch February 11, 2025 21:22
yaoyaoding pushed a commit that referenced this pull request May 12, 2025
…490)

In compilation server clean memory after every compilation

Consuming memory after compilation might be around 2GB. It is not used
but just allocated memory. Free it.

(cherry picked from commit ddc886682101c7368b98979285c4a2494025f20e)
tatianashp pushed a commit that referenced this pull request May 21, 2025
…490)

In compilation server clean memory after every compilation

Consuming memory after compilation might be around 2GB. It is not used
but just allocated memory. Free it.

(cherry picked from commit ddc886682101c7368b98979285c4a2494025f20e)
AndreSlavescu pushed a commit to AndreSlavescu/hidet that referenced this pull request May 31, 2025
…c_run in optimize (hidet-org#490)

### PR Comment - Symbolic Execution in `get_flow_graph`

To optimize the generation of the flow graph in the `torch.compile`
frontend with Hidet, we switched to using **symbolic execution** in the
`get_flow_graph` function. This change avoids compiling the entire
intermediate function, which is not used in the final fused function,
resulting in significant time savings during this phase of compilation.

---

### PR Comment - Optimization for the `optimize()` Function in
`torch.compile` Frontend

The `optimize()` function in the `torch.compile` frontend was spending a
significant amount of time compiling intermediate results. By
incorporating PyTorch into the computation of intermediate results and
leveraging their JIT compilation, we eliminated the need to compile our
own code for these results, thereby improving the overall compilation
time. This optimization resulted in substantial reductions in both
**compilation** and **inference** times, **without increasing inference
time**.

#### Performance Results for ResNet50 (over 5 runs):

| Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean |

|---------------------------------|------------|------------|------------|------------|------------|----------------|
| **without_eager_total_time** | 105.19 sec | 103.30 sec | 102.66 sec |
102.51 sec | 103.83 sec | **103.49 sec** |
| **without_eager_inference_time**| 17.18 sec | 16.12 sec | 16.04 sec |
16.12 sec | 16.08 sec | **16.30 ms** |
| **with_eager_total_time** | 22.90 sec | 22.89 sec | 22.55 sec | 22.58
sec | 22.61 sec | **22.71 sec** |
| **with_eager_inference_time** | 15.98 sec | 15.97 sec | 16.06 sec |
16.06 sec | 16.09 sec | **16.03 ms** |

---

#### Updated Performance Results for DenseNet121 (over 5 runs):

| Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean |

|---------------------------------|------------|------------|------------|------------|------------|----------------|
| **without_eager_total_time** | 331.08 sec | 332.79 sec | 331.97 sec |
326.15 sec | 325.28 sec | **329.44 sec** |
| **without_eager_inference_time**| 21.01 sec | 22.06 sec | 20.84 sec |
20.84 sec | 20.85 sec | **21.12 ms** |
| **with_eager_total_time** | 66.49 sec | 66.95 sec | 66.68 sec | 66.70
sec | 66.66 sec | **66.70 sec** |
| **with_eager_inference_time** | 20.83 sec | 20.90 sec | 20.90 sec |
20.88 sec | 20.91 sec | **20.88 ms** |

---

#### Performance Results for ResNet50 - Search Space 2 (over 5 runs):

| Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean |

|---------------------------------|------------|------------|------------|------------|------------|----------------|
| **without_eager_total_time** | 433.88 sec | 434.40 sec | 438.50 sec |
431.90 sec | 432.87 sec | **434.30 sec** |
| **without_eager_inference_time**| 11.05 sec | 11.02 sec | 11.01 sec |
11.04 sec | 11.00 sec | **11.03 ms** |
| **with_eager_total_time** | 355.77 sec | 356.84 sec | 355.12 sec |
354.91 sec | 355.45 sec | **355.62 sec** |
| **with_eager_inference_time** | 10.97 sec | 10.95 sec | 10.93 sec |
10.96 sec | 10.98 sec | **10.96 ms** |

---

#### Performance Results for DenseNet121 - Search Space 2 (over 5 runs):

| Configuration | Run1 | Run2 | Run3 | Run4 | Run5 | Geomean |

|---------------------------------|------------|------------|------------|------------|------------|----------------|
| **without_eager_total_time** | 1132.38 sec| 1123.16 sec| 1113.10 sec|
1115.51 sec| 1117.07 sec| **1120.22 sec** |
| **without_eager_inference_time**| 14.96 sec | 14.95 sec | 14.95 sec |
14.94 sec | 15.01 sec | **14.96 ms** |
| **with_eager_total_time** | 824.45 sec | 833.04 sec | 827.02 sec |
832.37 sec | 819.60 sec | **827.28 sec** |
| **with_eager_inference_time** | 14.95 sec | 14.98 sec | 14.95 sec |
14.95 sec | 14.97 sec | **14.96 ms** |

---

#### Analysis:

- **ResNet50 Total Time**: With `--eager`, the total time is reduced
from **103.49 seconds** to **22.71 seconds** in the default search
space, and from **434.30 seconds** to **355.62 seconds** in search space
2. This is an improvement of **~78.1%** in the default space and
**~18.1%** in search space 2.
  
- **ResNet50 Inference Time**: Inference time is slightly reduced in
both cases. For the default search space, it improves from **16.30 ms**
to **16.03 ms**, and for search space 2, it improves from **11.03 ms**
to **10.96 ms**.

- **DenseNet121 Total Time**: With `--eager`, the total time is reduced
from **329.44 seconds** to **66.70 seconds** in the default search
space, and from **1120.22 seconds** to **827.28 seconds** in search
space 2. This is an improvement of **~79.8%** in the default space and
**~26.2%** in search space 2.
  
- **DenseNet121 Inference Time**: Inference time remains consistent in
both cases. For the default search space, it is **20.88 ms**, and for
search space 2, it improves slightly from **14.96 ms** to **14.96 ms**.

---

#### Measurement Details:
- **Model used for measurement**: `resnet50` and `densenet121`, with
batch size of 128 and input shape `128x3x224x224`.
- **Data type**: `float16`.
- **Backend**: `hidet`.
- **Mode**: `default`.
- **Search space**: `0` for default results, and `2` for the second set
of results for ResNet50 and DenseNet121.

---------

Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants