Fix Dropout Implementation in Graphormer#24817
Conversation
This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation and improving performance. Specifically: 1. The `attention_dropout` variable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to use `attention_dropout` instead of the regular `dropout`. 2. The `activation_dropout` for the activations in the feed-forward layers was missing. Instead, the regular `dropout` was used. This commit adds `activation_dropout` to the feed-forward layers. These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.
|
Hi @clefourrier and others, |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
It still needs to be adressed! @clefourrier or whoever is responsible for it, am I doing something wrong with my pull request or what is taking so long for anyone to anwer?! How am I supposed to contribute if I am being ignored. |
|
Hi @alexanderkrauck ! |
clefourrier
left a comment
There was a problem hiding this comment.
LGTM, thank you for this fix
This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation and improving performance. Specifically: 1. The `attention_dropout` variable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to use `attention_dropout` instead of the regular `dropout`. 2. The `activation_dropout` for the activations in the feed-forward layers was missing. Instead, the regular `dropout` was used. This commit adds `activation_dropout` to the feed-forward layers. These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.
What does this PR do?
This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation (https://github.com/microsoft/Graphormer) and improving performance. Specifically:
attention_dropoutvariable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to useattention_dropoutinstead of the regulardropout.activation_dropoutfor the activations in the feed-forward layers was missing. Instead, the regulardropoutwas used. This commit addsactivation_dropoutto the feed-forward layers and to the GraphormerConfig including documentation.These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@clefourrier