Skip to content

feat(dictionary): cache tokens in IndexedDB#844

Merged
killergerbah merged 6 commits intokillergerbah:mainfrom
ShanaryS:dictionary-db
Jan 8, 2026
Merged

feat(dictionary): cache tokens in IndexedDB#844
killergerbah merged 6 commits intokillergerbah:mainfrom
ShanaryS:dictionary-db

Conversation

@ShanaryS
Copy link
Copy Markdown
Collaborator

@ShanaryS ShanaryS commented Dec 15, 2025

This eliminates the runtime dependency of Anki (though if it's available, it will be used to automatically keep cache in sync). This gives a massive speedup since we no longer need to query Anki as well as Yomitan for the Anki cards (especially sentence cards). Support for local tokens is not exposed to the user in this PR.

  • The IndexedDB is accessed through DictionaryProvider (works the same as SettingsProvider).
  • There is 3 stores used:
    • meta: Use for tracking the current and previous builds to know if settings changed or build is in process
    • tokens: Stores each word with their lemmas status, etc
    • ankiCard: Anki tokens derive their status from this store as we need to track other things like suspended status
  • Cache is only re-built when a card/note modification time changes which covers review, suspend, edit, or added changes.
    • During playback we still use the existing method to detect new cards as checking modified times for all cards frequently is not desirable, Anki does not allow querying cards > modifiedAt. This means we will only trigger a cache build once for a given card during a playback session (though subsequent card triggers will update everything). This covers the main case of collecting a token but tracking more is too inefficient.
  • Builds are guarded by a buildId so concurrent builds are prevented even in case of shutdowns with no cleanup. Build will expire when completed or if more than 5 minutes has passed since the last update (every few seconds).
  • Builds can happen while it's being queried during playback, updated tokens will be sent (if build was not manually triggered) so that they can be recolored.
  • On settings change for a track, it will be cleared next build before being re-populated. If a build is interrupted before completion, it will resume where it left off (and any changes since).
  • Triggers were added so that when an Anki card is exported/updated from SubtitlePlayer, it triggers a cache build immediately. This already existed if it was initiated from the extension (such as keybind).
  • The settings tab will display helper text if the user changes settings that will require a cache rebuild as well as build progress (build status messages are exposed to translations).
  • Reading annotations can be enabled independent of coloring, Always and Never won't trigger cache builds for that track (unless coloring is enabled).
  • Added ability to restrict tracks per deck in case multiple unrelated decks share the same note type.
  • Disabling Anki is achieved by keeping the fields empty which will clear that track from the db on the next build. Disabled tracks are kept in the db indefinitely.
  • Yomitan will require the version with the optimizations in place for tokenize, this significantly improves things and should not be a problem since this feature is unreleased and extensions should auto update.
  • When settings are updated, SubtitleColoring will now only reset the cache on settings that affect it, rather than all. It will now also clear the richText that's currently being displayed if the user turns off the features as we rely on it being enabled to update it.
  • We will now color more intelligently. The first request will only color 10 events, then subsequent requests can be up to 100. They will only be triggered once the user is about to need another batch, rather than single events on each new showing subtitle.
  • Local tokens are not per track unlike Anki, this makes it easier for users to switch between tracks and have local tokens behave consistently. If per track local tokens are desired, it can be trivially added as an opt in later.
  • Priority for tokens is Local > Anki word > Anki sentence
  • Added the unlimitedStorage permission which should keep the DB persisted, preventing users from losing local tokens.
  • Did not add offline resume, that is not requiring Anki to finish building if it was interrupted. This would only matter for first time builds.
  • The limitations that still exist are inherent to this problem domain:
    • Homographs can't be dealt with: bat (animal) and bat (baseball) cannot be separated (obviously just a fundamental limitation with written language). Would need AI for context which is a non-starter.
    • Words that are false friends, same spelling but different meaning across languages. Not worth addressing but mitigated by filtering by decks.

@ShanaryS ShanaryS self-assigned this Dec 15, 2025
@ShanaryS ShanaryS added the enhancement New feature or request label Dec 15, 2025
@ShanaryS ShanaryS force-pushed the dictionary-db branch 2 times, most recently from 842fba2 to 78bad00 Compare December 17, 2025 04:36
Copy link
Copy Markdown
Owner

@killergerbah killergerbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey really appreciate the patience. I'm still trying to read and understand everything, so I've left some high level feedback for now. Let me know what you think.

@ShanaryS ShanaryS force-pushed the dictionary-db branch 6 times, most recently from 372ab82 to c07bd96 Compare December 30, 2025 04:19
Copy link
Copy Markdown
Owner

@killergerbah killergerbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks again for the patience. I did another pass and mostly made suggestions to improve readability. I know some of this is probably just stylistic difference, but there's a lot of code with complex logic which I think can be made more readable with low cost. Mostly worried about needing to understand this code when making changes months from now.

.map((track) => `#${track + 1}`)
.join(', '),
})}`;
msg += ` | ${t('settings.dictionaryBuildModifiedCards', { numCards: numUpdatedCards.toLocaleString('en-US') })}`;
Copy link
Copy Markdown
Owner

@killergerbah killergerbah Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can localization of the progress strings be confined to the dictionary settings form instead? This would require you to model the progress/errors via a data structure but I think this is probably not so difficult as there's a limited number of types of status updates.

While the refactor might be kind of annoying, keeping the front-end and logical concerns separate as a matter of principle will will give us more flexibility to render the progress in different ways on the settings form, without a dependency on this code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That very flexibility was what I was trying to avoid. Some messages only makes sense to be displayed in certain states and constructing that seems like it would be complicated. Also some messages don't have translations if it's a random error. We also would need to handle translations in SubtitleColoring as it can log errors whenever tokens updates.

I still think it's worth doing though. I'm thinking of returning an object instead where the keys correspond to the translation key and values to the translation params. As well as a separate field for unspecified errors.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you provide both English string + the translation key, you could use the English string for logging in SubtitleColoring. The translation key could be used for display on the settings form.

Copy link
Copy Markdown
Owner

@killergerbah killergerbah Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But instead of translation key, maybe provide an an enum? That will give us some decoupling of the progress/error type from the loc file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get a design that I'm happy with so I think it's best if you make the changes.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed, let me know if anything looks weird to you

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, the only issue I noticed is that the elapsed time changes when a build is done and the button is pressed. I think this lies in how the UI is calculating it but it's not a big deal.

I also re-review the rest of the PR as well. Earlier your comment about dictionaryProvider was accurate, we could use DictionaryStorage directly. I'm not sure if it's worth updating though. I could see it being useful for statistics.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, the only issue I noticed is that the elapsed time changes when a build is done and the button is pressed. I think this lies in how the UI is calculating it but it's not a big deal.

I briefly tried this and couldn't repro something like what you're describing. All I did was finish a cache build, and then press the button again. But then it just shows you "Anki track(s): #1 | 0 modified card(s)
".

If there is issue that's easy to fix we can clean that up. I also want to note the following we should probably fix before release:

  • Leaving the "annotations" tab and coming back will reset the button state, even if a build is in progress, until the next build update is received.
  • The button is clickable even if card targeting is not completely set up (e.g. 0 word/sentence fields).
  • We should probably add some more helper text to direct the user to setup stuff, if they enabled "colorize subtitles based on Anki maturity." Since some users might expect everything to work without further configuration.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I did was finish a cache build, and then press the button again.

It's only when at least a card is built so it shows the elapsed time. The next time the button is pressed the elapsed counter is updated but this is fixed by setting the state to undefined on the button press.

Leaving the "annotations" tab and coming back will reset the button state, even if a build is in progress, until the next build update is received.

Is there an easy fix for this? Otherwise it seems we would need to check the build on page load. But we won't get the updates in all scenarios, such as if they open the website in a new tab. If there isn't an easy fix it should be fine to leave as only the initial build should be long.

The button is clickable even if card targeting is not completely set up (e.g. 0 word/sentence fields).
It can be disabled if no tracks are enabled, but empty fields are okay as that's what will clear the db on the next build. I also don't want anything too distracting if they don't configure it since it's valid to only use local tokens.

I'll address some of these in the local tokens PR and we can discuss more changes then.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy fix for this? Otherwise it seems we would need to check the build on page load.

Probably, just need to put the state somewhere outside the tab maybe

@killergerbah killergerbah merged commit 9d682bc into killergerbah:main Jan 8, 2026
1 check passed
@ShanaryS ShanaryS deleted the dictionary-db branch January 8, 2026 23:27
@killergerbah killergerbah added this to the Extension v1.14.0 milestone Jan 19, 2026
RonzyOnGIT pushed a commit to RonzyOnGIT/asbplayer that referenced this pull request Feb 2, 2026
* feat(dictionary): cache tokens in IndexedDB

* use anki decks in addition to anki fields

* use DictionaryProvider

* update DictionaryProvider subscription api and cleanup code

* remove _updateSuspendedCards() and cleanup code

* Anki cache build state updates are structured

---------

Co-authored-by: R-J Lim <kgerbil@gmail.com>
RonzyOnGIT pushed a commit to RonzyOnGIT/asbplayer that referenced this pull request Feb 21, 2026
* feat(dictionary): cache tokens in IndexedDB

* use anki decks in addition to anki fields

* use DictionaryProvider

* update DictionaryProvider subscription api and cleanup code

* remove _updateSuspendedCards() and cleanup code

* Anki cache build state updates are structured

---------

Co-authored-by: R-J Lim <kgerbil@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants