Seanie Lee comments

Results 42 comments of


                                            Seanie Lee

which Chinese Bert model ? which repo?

Here is the [code](https://gist.github.com/seanie12/ba12b5d1154f5012208cda81d9f489c3) for finetuning Chinese BERT model. You can download the pretrained Chinese BERT model we use from [here](https://huggingface.co/bert-base-chinese)

suggestion to change some Pinyin style

Thanks for the suggestion!

Pronunciation of "A"

안녕하세요, g2pm이 완벽하지 않아서 생기는 오류입니다. README에서도 확인하실 수 있듯이, test set에서 2.15% 잘못 예측한것을 확인하실 수 있습니다. 또한 첨부해주신 이미지에서 하나의 문자만을 입력하셨는데, 이런 케이스에서는 저희 모델이 예측을 잘못...

论文示例里的数据输出错误

Sorry, I can not understand Chinese. Could you tell me what the issue is in English? Goolgle translation says the output of our model is different from the one described...

论文示例里的数据输出错误

The model even makes mistakes for common polyphonic words. g2pm is very simple model for polyphone disambiguation. There is still large room for improvement of g2pm.

Why the count of polys in cedict is larger then that in corpus

Hi, as mentioned in the previous issue, our dataset does not cover all possible Chinese polyphonic characters. We collect Chinese sentences from wikipedia and label it, so some of polyphonic...

Can you provide the complete code for training?

Hi, in my experiment the performance of bert is really sensitive to the choice of optimizer and learning rate. I will upload the scripts for training bert as soon as...

Can you provide the complete code for training?

Yes the number of output of fc layer is the number of all possible pinyins for polyphonic character. Here is the [code](https://gist.github.com/seanie12/ba12b5d1154f5012208cda81d9f489c3) for training bert.

how to new data？

Could you explain it in more detail?

what does the special PinYin "xx5" used for

Hi, class2idx is a dictionary which maps each pinyin to its own id. So the id corresponds to the index of softmax layer. The are two reasons why there is...