Conversation
|
|
||
|
|
||
| def _make_latent_generator_tokenizer() -> PreTrainedTokenizerFast: | ||
| """Create a `PreTrainedTokenizerFast` object for tokenization of protein structure latent generator sequences. |
There was a problem hiding this comment.
should we make this more general and say "tokenization of 3D coordinates"?
There was a problem hiding this comment.
I was going to add the same comment
There was a problem hiding this comment.
as in rename it to something like: _make_3d_coordinates_tokenizer or just change the description?
There was a problem hiding this comment.
I think renaming the whole tokenizer class and related functions to something that includes 3D coordinates makes sense since it's not immediately obvious what the tokenizer is for otherwise
|
|
||
| from ._make_pretrained_tokenizer_fast import make_pretrained_tokenizer_fast | ||
|
|
||
| LG_VOCAB = {'<cls>': 0, '<pad>': 1, '<eos>': 2, '<unk>': 3, '<mask>': 4, '.': 5, 'a': 6, 'b': 7, 'c': 8, |
There was a problem hiding this comment.
not a strong preference but this could be a txt file? either way works though
There was a problem hiding this comment.
can we keep it as a dictionary the vocab.txt format just ends up adding more operations to get back to the dictionary, and i liek the simplicity of being able to use the dictionary elsewhere
|
|
||
| from ._make_pretrained_tokenizer_fast import make_pretrained_tokenizer_fast | ||
|
|
||
| LG_VOCAB = { |
There was a problem hiding this comment.
I think the file is a bit too long when these are defined as a dictionary. Might be better just to use the saved txt file instead
No description provided.