feat(dictionary): support local and ignored tokens by ShanaryS · Pull Request #853 · killergerbah/asbplayer

ShanaryS · 2026-01-09T01:37:12Z

fixes #178

This allows collecting tokens locally without using Anki or to override statuses set by Anki.

Priority for tokens is Local > Anki word > Anki sentence
Local tokens are not per track unlike Anki (but still per profile), this makes it easier for users to switch between tracks and have local tokens behave consistently.
- If per track local tokens are desired, it can be trivially added as an opt in later.
Local tokens are mainly collected by hovering over the word and pressing the configured keyboard shortcut
- Coloring must be enabled as it is used as the feedback for collection
- Default keybinds are Q+[0-5] and Q+I for collecting and ignoring tokens
Local tokens can also be collected using a clipboard import located in the Annotations tab
- If users desire collecting without coloring or a token that was parsed differently they would use this feature
Tokens can be exported and imported for backup, these only apply to local tokens as Anki tokens can always be regenerated
- Imports are purely constructive, it will not delete existing tokens in the db and will use the highest status of existing or the import. The imported states are only used if there are currently no states on the token.
Ignored tokens are treated as fully known, these are used for things such as names or places. Currently there is no value to ignored tokens but it will be useful when performing statistics.
- If desired we can give ignored tokens their own color but at present I don't see a reason to
The buildAnkiCache button is now disabled when all tracks are disabled with additional helper text if no Anki fields are configured
ensureStoragePersisted() is used on any user interaction with this feature for additional assurance that the db will not be deleted.
Treat VideoPlayer subtitles similar to SubtitleController, it will no longer split by line and instead use white-space: pre-wrap

In the future, it would be beneficial to have a way to browse the local words by their token/lemmas, states, status, etc. This would also be a good area to delete tokens, currently users must hover and mark uncollected or use the clipboard import with uncollected.

…enabled

common/subtitle-coloring/subtitle-coloring.ts

killergerbah · 2026-01-11T07:15:30Z

Btw one thing I noticed when testing this, is that the default underline style makes it hard to tell the boundaries between consecutively underlined tokens. With this feature, I think it becomes more important to know exactly which token will be saved when the user uses the shortcut. Solvable with a highlight effect? Or spacing between consecutively underlined tokens?

ShanaryS · 2026-01-12T23:54:06Z

I added a highlight for the token but it doesn't seem to work for SidePanel. Also as a general note, the keybinds only work when the player is in focus. I think this is desirable but can be unintuitive if the user is multi tasking. This highlight effect is always there though when hovered, even if the player is not in focus.

killergerbah · 2026-01-13T23:02:23Z

Hmm maybe highlight is not ideal then. Could just adjust the styles so that they have boundaries between - at least those styles are connected to the settings already.

common/dictionary-db/dictionary-provider.ts

killergerbah · 2026-01-15T23:29:05Z

common/settings/settings.ts

 }

+export enum TokenState {
+    IGNORED = 0, // If ever adding more states, they should go last (if adding colors for states, use a separate array from tokenStatusColors indexed by TokenState)


Is it accurate to say that "ignored" is effectively an alias for "fully known?" Asking just for my own information - I guess this is useful if we want to treat "ignored" differently from "fully known" in the future.

Yes, they are completely equivalent currently. Their use will become important for statistics, so users can ignore places and names in the known words calculation and perhaps other usages. I opted not to give these their own customizable color due to this but we can add support for it later (and any other states that we add).

I designed the states using an array on the tokens even some may be mutually exclusive. Handling those complexities is left to the code to be flexible.

killergerbah · 2026-01-15T23:31:38Z

common/settings/settings.ts

+    IGNORED = 0, // If ever adding more states, they should go last (if adding colors for states, use a separate array from tokenStatusColors indexed by TokenState)
+}
+
+export enum ApplyStrategy {


If the user accidentally saves a local token, is it not possible to undo that operation right now? I noticed REMOVE is not currently used outside of the dictionary DB. If that's true then maybe we can solve it with a token browser UI. I think you mentioned that at some point already.

Oh I see you wrote that in the PR description already, lol

Yeah, the plan for managing tokens is essentially exposing the db to the users so they can read and update it directly (where applicable). So they type in a string and we search for tokens and lemmas with that substring and display results. It could be organized by source where local tokens can be deleted or have applicable fields editable, but it would also expose what tokens were processed from Anki. Essentially think of the Anki browser but for the dictionary db.

This is fairly complex (on the UI/UX side) and doesn't offer any real functionality. Uncollecting using hover and clipboard should be fine until we get around to that.

Makes sense, can be for later

common/dictionary-db/dictionary-db.ts

ShanaryS · 2026-01-16T03:54:27Z

I updated all the functions to return objects and also am returning the keys instead of just the count for deletions. This gives more flexibility for future use.

I also changed how existingTokens are handled when importing. Now the existing states will always be preserved exactly as I think this makes the most sense. If the token already exists, then it should be safe to assume that's the state the user wants it in if they recently collected it. Users may end up requesting more control over the import logic, such as a destructive replace or exposing ApplyStrategy. But for now I think using the highest status and existing states is the best default.

Hmm maybe highlight is not ideal then. Could just adjust the styles so that they have boundaries between - at least those styles are connected to the settings already.

I removed the highlight on hover for now, I think it maybe possible to detect if the element is in focus and only displaying it then, we can't just use the window focus as the SubtitlePlayer and Video have separate focuses for the keybinds. In general I don't think it will be a big deal as users should know enough to know what word they are collecting, and we can assume that adjacent words are unlikely to be the same status once users have settled in.

killergerbah · 2026-01-17T23:46:12Z

common/components/DictionarySettingsTab.tsx

+            const file = dictionaryDBFileInputRef.current?.files?.[0];
+            if (file === undefined) return;
+            await dictionaryProvider.importRecordLocalBulk(
+                JSON.parse(await file.text()),


Should we allow more flexibiity with the file import? As it is, the user would have to be aware of the JSON schema for the token records, and also pre-lemmatize everything. Meanwhile, import via clipboard uses one token per line. I think users would expect both import from file and import from clipboard to use the same data format. For file import, we could support both - try json parse first then try split by newline.

If we do this, it would make sense to merge the current file import with the clipboard import dialog, as we would want to re-use the lemmatization logic etc.

I can take on this change in a different PR if you agree.

Yeah I think that makes sense, the original goal of the export/import is to be handled by asbplayer only and users would be able to covert to it with the exception of the lemma. It think giving the option to parse the lemmas make sense.

I want to clarify, the clipboard import does not split by newline, it parses whatever arbitrary text is there. But the user can dictate words by adding newlines so Yomitan won't see it as a word, we send the text as is. This means though that long text will be slow since we won't be batching though we could by splitting the newlines ourselves. I don't expect users to to paste more than say 100 words though.

killergerbah · 2026-01-17T23:54:23Z

common/dictionary-db/dictionary-db.ts

+                let status = item.status;
+                if (item.status == null || item.status < TokenStatus.UNKNOWN) {
+                    if (!item.states.length) continue; // Status cannot be uncollected unless there is a state
+                    if (status == null) status = TokenStatus.UNCOLLECTED;


status could be undefined as well?

It can't but if it was should be treated the same.

NovaKing001 · 2026-01-19T06:53:39Z

@ShanaryS Hey, I've been using this PR for the past week and have a few ideas/suggestions.

Add more shortcuts

Add shortcuts to set the status of all uncollected words in a subtitle line to either 'unknown' or 'known' this would help speed up the process of adding words.

Treat ignore as its own status

This would allow for ignored words to have word readings, useful for languages such as Japanese, where proper names have ambiguous readings. I have a names dictionary installed on yomitan, and having their readings show up would be a plus.

Clipboard issues

I know that you said that you don't expect users to import more than 100 words ... I tried to import 10k known words, and it wouldn't successfully add them. It would preview for a few minutes and allow me to save; however, saving didn't do anything, and no words were actually added. I have to rely on having an Anki deck with my known words marked as suspended to pseudo-import my words.

Highlighting

The highlighting feature was great for identifying false positives. Making it a toggle in settings would be great. The more dictionaries you have installed in Yomitan, the weirder the parses work.
For example, imagine trying to change the status of チャンク, but instead it changes 2チャン. Highlighting would help identify these Yomitan false positives.

Some characters are not able to have their status changed

Some characters, such as small よ　(ょ), are tokenized but are unable to have their statuses changed

In the picture above, all of the small characters aren’t able to be changed.

Future feature?

To mitigate false positives, you can manually select the characters and have them be tokenized/corrected
for example
from

to

This is currently how it works in Lute
Screencast_20260118_235313.webm
It would probably work by segmenting those characters from the subtitle line and sending them separately to yomitan however I don’t know how that would work with every instance of those characters. I dont know if it would be feasible/ worthwhile. Probably best to wait for this pr yomidevs/yomitan#2254 to help with Yomitan’s false positives/ inaccurate parsing.

Thank you for your hard work! I probably missed a few things, let me know what you think. Thanks!

ShanaryS · 2026-01-19T20:38:58Z

@NovaKing001

Add more shortcuts

This seems pretty niche and only gets less relevant as more words are marked.

Treat ignore as its own status

From a technical point and don't think it's a good strategy, these flags on the cards should be independent of status. But we can add a toggle to show readings for ignored words.

Clipboard issues

Yeah it's not meant for that big of an import, you use the import words button. It's meant for single words or sentences, maybe a paragraph. If you split it up yourself you be able to use it for that. But I find it hard to believe you have 10k words you can copy and they are all the same state. It sounds like you are using Anki for these words, so why do you need to import that many?

Also what is the structure of the text you are pasting? Does it have punctuation, spaces, or line breaks?

Highlighting

Yeah this can be brought back in some way but some parts of the existing version was unintuitive.

Some characters are not able to have their status changed

These all work fine for me and shouldn't be handled any different, perhaps you are on an outdated branch. The Yomitan parsing for this is incorrect though.

To mitigate false positives, you can manually select the characters and have them be tokenized/corrected

I'm not sure how much value this would bring, if you know the tokenization is incorrect there is no need to mark it. We could only ever apply the override to that specific subtitle event, we won't be allow those characters to say always be parsed as a word. Language is too context sensitive for us to make any parsing decisions on our side so we will just have to rely on better parsers if this algorithm is insufficient.

NovaKing001 · 2026-01-22T04:02:53Z

@ShanaryS Thank you for responding

This seems pretty niche and only gets less relevant as more words are marked.

I have over 10,000 words marked, and I still find it tedious trying to mark all the false positives I'm getting. I think I just worded this wrong. What I really meant was adding some shortcuts related to bulk editing. I would hardly call this niche as Migaku , Lute, and Lingq all have some features related to bulk editing on the spot without having to exit and access a separate word list. Maybe a shortcut that could highlight all uncollected words, and the user could then choose a status using their status shortcuts.

The reason I'm asking for these bulk shortcuts is that when someone inevitably adds dynamic autopausing based on word statuses, it'll autopause on known words that are incorrectly parsed as separate unknown words. It will improve the workflow to quickly bulk edit words without having to hover over each word. Of course, this is all my opinion as someone who has used similar tools in the past; saving a couple of seconds really does improve the user experience.

Yeah it's not meant for that big of an import, you use the import words button. It's meant for single words or sentences, maybe a paragraph. If you split it up yourself you be able to use it for that. But I find it hard to believe you have 10k words you can copy and they are all the same state. It sounds like you are using Anki for these words, so why do you need to import that many?

The words I'm importing are all in newlines, just words, no sentences. As for the import button, I noticed that it's for JSON files. Is it able to accept CSV/txt files? I tried to import a txt file before, and nothing seemed to happen. All of my known words come from LUTE, which can either export as a txt or a csv. I imported these words to Anki to be able to use them in conjunction with asbplayer.

As for the weird Yomitan parsing, I have no clue why this is happening; it might be due to the number of dictionaries I have installed. I can send you an srt file and cross-check the parsing.

Once again, thank you for adding the Yomitan API. It has greatly improved my workflow and opened up possibilities for many additional features I'd like to discuss in the near future.

ShanaryS · 2026-01-24T17:37:16Z

it'll autopause on known words that are incorrectly parsed as separate unknown words

Solving this is currently outside of the design scope. We rely completely on Yomitan parsing and overriding them is not currently planned. That's why even if you notice something off and manually try to fix it, it order for it to apply for the next subtitle or next session would require us making decisions about language parsing. The solution is to find alternatives to the standard Yomitan parser (there is a PR for exposing mecab) that's more accurate.

If you are fine with marking the incorrectly parsed words as known then it's fine. But bulk marking like this won't be a high priority as there is more pressing things to focus on.

Is it able to accept CSV/txt files?

If it needs to be parsed a specific way then that's probably not going to be high priority. If it's just a text file as you say with words (either manually separated or just sentences) then that will be supported. #857 should help with that.

* export/import local tokens * collect tokens locally based on hovered element * import words from clipboard * support ignored * add more helper text for buildAnkiCache and disable if no tracks are enabled * highlight token on hover * return objects for dictionary query results

ShanaryS added 5 commits January 8, 2026 18:30

export/import local tokens

3ca6187

collect tokens locally based on hovered element

6e673a0

import words from clipboard

69b2d6d

support ignored

46b81be

add more helper text for buildAnkiCache and disable if no tracks are …

726f109

…enabled

ShanaryS self-assigned this Jan 9, 2026

ShanaryS added the enhancement New feature or request label Jan 9, 2026

killergerbah reviewed Jan 11, 2026

View reviewed changes

common/subtitle-coloring/subtitle-coloring.ts Outdated Show resolved Hide resolved

common/subtitle-coloring/subtitle-coloring.ts Show resolved Hide resolved

common/subtitle-coloring/subtitle-coloring.ts Outdated Show resolved Hide resolved

highlight token on hover

911f453

ShanaryS force-pushed the dictionary-local-tokens branch from de44431 to 911f453 Compare January 12, 2026 23:51

killergerbah reviewed Jan 15, 2026

View reviewed changes

return objects for dictionary query results

7f9a9f8

killergerbah reviewed Jan 18, 2026

View reviewed changes

killergerbah approved these changes Jan 18, 2026

View reviewed changes

killergerbah merged commit 664ae42 into killergerbah:main Jan 19, 2026
1 check passed

killergerbah added this to the Extension v1.14.0 milestone Jan 19, 2026

ShanaryS deleted the dictionary-local-tokens branch January 19, 2026 14:20

ShanaryS mentioned this pull request Jan 25, 2026

Rearrange annotation settings #857

Merged

Conversation

ShanaryS commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

killergerbah commented Jan 11, 2026

Uh oh!

ShanaryS commented Jan 12, 2026

Uh oh!

killergerbah commented Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShanaryS commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NovaKing001 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add more shortcuts

Treat ignore as its own status

Clipboard issues

Highlighting

Some characters are not able to have their status changed

Future feature?

Uh oh!

Uh oh!

ShanaryS commented Jan 19, 2026

Uh oh!

NovaKing001 commented Jan 22, 2026

Uh oh!

ShanaryS commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShanaryS commented Jan 9, 2026 •

edited

Loading

ShanaryS commented Jan 16, 2026 •

edited

Loading

NovaKing001 commented Jan 19, 2026 •

edited

Loading