Seanie Lee
Seanie Lee
Here is the [code](https://gist.github.com/seanie12/ba12b5d1154f5012208cda81d9f489c3) for finetuning Chinese BERT model. You can download the pretrained Chinese BERT model we use from [here](https://huggingface.co/bert-base-chinese)
Thanks for the suggestion!
안녕하세요, g2pm이 완벽하지 않아서 생기는 오류입니다. README에서도 확인하실 수 있듯이, test set에서 2.15% 잘못 예측한것을 확인하실 수 있습니다. 또한 첨부해주신 이미지에서 하나의 문자만을 입력하셨는데, 이런 케이스에서는 저희 모델이 예측을 잘못...
Sorry, I can not understand Chinese. Could you tell me what the issue is in English? Goolgle translation says the output of our model is different from the one described...
The model even makes mistakes for common polyphonic words. g2pm is very simple model for polyphone disambiguation. There is still large room for improvement of g2pm.
Hi, as mentioned in the previous issue, our dataset does not cover all possible Chinese polyphonic characters. We collect Chinese sentences from wikipedia and label it, so some of polyphonic...
Hi, in my experiment the performance of bert is really sensitive to the choice of optimizer and learning rate. I will upload the scripts for training bert as soon as...
Yes the number of output of fc layer is the number of all possible pinyins for polyphonic character. Here is the [code](https://gist.github.com/seanie12/ba12b5d1154f5012208cda81d9f489c3) for training bert.
Could you explain it in more detail?
Hi, class2idx is a dictionary which maps each pinyin to its own id. So the id corresponds to the index of softmax layer. The are two reasons why there is...