task-aware-distillation Request for Additional Code and Hyperparameters

Dear Prof. Chen Liang,

We're working on a project involving language model compression and your work with TED has been insightful. We are now trying to compare TED with KD and LWD techniques, and evaluate TED on GLUE tasks.

We are having difficulty reproducing the GLUE benchmark results as reported in your paper. If possible, could you share the baseline KD and LWD frameworks code and the code for TED evaluation on GLUE?

If sharing the code is not feasible, could you please provide the hyperparameters used in your experiments? This would greatly assist our research.

Best, Chengfei Liu

Jul 29 '23 02:07 jian53286

Hi Chengfei, thanks for your interests in TED. We will be working on adding the GLUE codes and hyperparameters soon. Please stay tuned.

Jul 31 '23 17:07 cliang1453

Thanks for your insightful work! Could you share your code about GPT-2? It will be super helpful for my research. Thanks a lot!

Aug 14 '23 11:08 aaronma2020

Hi @jian53286 , the code for GLUE has been released. Hi @aaronma2020 , we will work on adding GPT-2 soon.

Aug 28 '23 04:08 cliang1453