feat(subtitles): color subtitle words based on Anki intervals by ShanaryS · Pull Request #813 · killergerbah/asbplayer

ShanaryS · 2025-10-29T06:58:53Z

fixes #193
fixes #789

This uses the Yomitan API discussed in the issues. It creates a new tab in the UI called Dictionary where all these settings are stored. They all can be moved to the existing sections but that would seem to be confusing.

Design:

Call AnkiConnect/Yomitan in a way that avoid CORS issues for extension
Tokenize text with Yomitan
Treat symbols, punctuation, and numbers as known (doesn't contain a unicode letter class character)
Lookup tokens in Anki using word fields
- Use lemmatized token from Yomitan only if inflected version is uncollected
Lookup tokens in Anki using sentence fields only if word fields are uncollected
- Tokenize the sentence from Anki and match by tokens to reduce false positives
- Use lemmatized token from Yomitan only if inflected version is uncollected
Color words based on Anki stability/intervals
- Stability will be used if the user has FSRS enabled otherwise we fallback to the intervals
- Tokens are marked with the highest status if there are multiple conflicting cards
- Suspended cards can always be treated normally or as a specific status
- If only some cards are suspended for a given token, only use choose the status from the unsuspended ones
Cache all of these steps at the per token and per subtitle event level
- Poll Anki on interval and when asbplayer creates/updates a card to trigger recheck of uncollected tokens
- Never update colors based on changed review or suspended status, unlikely and wasteful
- Strikethrough tokens with red if Anki or Yomitan connection fails, automatically colors tokens when connection resumes
- App and SidePanel listens for an event for when the colors are updated
Subtitle events are tokenized on the fly for the showingSubtitles and a buffer of future events
- Prioritize building for showingSubtitles by cancelling previous for responsive and relevant work (e.g. seeking)
Support coloring subtitles from the App without extension being installed
- SubtitleController: Extension | SubtitlePlayer: App
- SubtitlePlayer always handles coloring the website or SidePanel, listens for extension requests if in use

UI/Options

Next PR - Using `IndexedDB`:

We currently cannot match if the subtitle token is inflected while the card in Anki is inflected different. For example, the subtitle is standing but the user only has an Anki card with stood. By parsing the Anki fields ahead of time and storing them in IndexedDB, we can also get their lemma allow us to match using the base form stand. Using IndexedDB will also allow much faster lookups since we won't need to use Anki in realtime.

IndexedDB also allow us to manually mark words as known without adding them to Anki. Users will also be able to import known words easily. Users can use these words without Anki or use both were the manual/imported ones take priority.

The structure will likely be using 3 "tables" like: token_local token_anki_word token_anki_sentence
These each of these "tables" will have 3 "columns": lemma|status|inflections
Where inflections is a json with key-value of: inflection:status

An example entry:

Lemma	Status	Inflections
run	1	{"running": 2, "ran": 3}

The Lemma "column" will be indexed and will be how lookups are performed. When we tokenize a subtitle event, each token will also be lemmatized then looked up against the database. Users will be able to choose how inflections or lemmas are used for known status. I'll stop here with the details but I already have a good idea on how to structure and use the data for this task.

Future PRs:

Image subtitles?
Keybind to hide/show colors
Generate comprehension score based known words
Auto pause after uncollected word
Automatically mine all uncollected words with single click

NovaKing001 · 2025-11-01T03:40:29Z

Hey, this is awesome! I’m the one who originally requested this. I’m not too familiar with coding, but I do have a few requests.

Correct me if I’m wrong, but I see that you’re coloring the words based on Anki intervals. Wouldn’t it be better if the user could upload a TSV file where each word is listed with a value assigned according to its maturity level—similar to how LUTE does it?

For example:

Term	Status
別れ	4
別れる	4
利いた	1
利く	1
利用	5
制	1
制度	1

Also, I think it would be useful to have an “ignore” status for any words the user wishes to exclude, such as fictional words or names.

As for changing a word’s maturity level instantly, I’d like it to work similarly to LUTE
, where you can hover over a word and press a number key. For example, pressing “1” would turn an uncollected word into an unknown word.

I’d love to contribute to this request as much as possible. If you need ideas, just ask! Thank you!

ShanaryS · 2025-11-01T07:54:34Z

Correct me if I’m wrong, but I see that you’re coloring the words based on Anki intervals. Wouldn’t it be better if the user could upload a TSV file where each word is listed with a value assigned according to its maturity level—similar to how LUTE does it?

For 99% of users no. Most will much prefer the automatic fetching and real-time updates from their existing Anki connection to asbplayer. I think your suggestion has value and I'd be interested in implementing it but that's up to @killergerbah. It should be very straightforward, it would just replace Anki in this workflow which would only be a few lines of code and the option would slot in seamlessly. But either way, this PR is big enough as it is, I expect this review will come with a lot of changes and discussion.

Also, I think it would be useful to have an “ignore” status for any words the user wishes to exclude, such as fictional words or names.

I plan to with Manually mark words as known for ASBPlayer (overrides Anki). Will not be in this PR. I know @killergerbah had some discussions with others before that storing this data will take some planning. I personally think any solution is acceptable as long as it's exported with the backup.

As for changing a word’s maturity level instantly, I’d like it to work similarly to LUTE
, where you can hover over a word and press a number key. For example, pressing “1” would turn an uncollected word into an unknown word.

Would probably be implemented at the same time as manually marking words.

I’d love to contribute to this request as much as possible. If you need ideas, just ask! Thank you!

Please comment any other ideas that you have. The more discussion that happens now the easier it will be to plan for the future.

NovaKing001 · 2025-11-01T14:43:30Z

For 99% of users, no. Most will much prefer the automatic fetching and real-time updates from their existing Anki connection to asbplayer. I think your suggestion has value and I'd be interested in implementing it, but that's up to @killergerbah. It should be very straightforward—it would just replace Anki in this workflow, which would only be a few lines of code, and the option would slot in seamlessly. But either way, this PR is big enough as it is; I expect this review will come with a lot of changes and discussion.

I do think having Anki integration is great, but I’ll leave this anecdote for future consideration and/or implementation.

Not every known word in my target language is in my Anki deck. I have around 11k known words, but only about 250 cards are in Anki—mostly because I’ve deleted and recreated decks over the years. Some words were never even added to Anki due to repeated exposure through reading multiple books.

Thank you for this PR! I’m really looking forward to its development. I’ll stay in touch with any suggestions I might have.

artjomsR · 2025-11-02T21:56:38Z

@NovaKing001 #770 there's a new functionality added in the next release which will allow bulk adding cards so that might help

Actually, a question from me - does it take into account whether the card is suspended or not? (E.g. I mark a card a suspended after a while to mark it as "known" in my Anki collection)

ShanaryS · 2025-11-02T23:11:28Z

Actually, a question from me - does it take into account whether the card is suspended or not? (E.g. I mark a card a suspended after a while to mark it as "known" in my Anki collection)

Kind of. It bases it off of its interval so if it's above 21 (or the value you set) it will be marked as known. But it would be easy to add an option for it like this.

Treat suspended Anki cards as:

Normal
Mature
Young
Unknown

killergerbah · 2025-11-04T13:46:53Z

Thanks @ShanaryS looks like a huge body of work. I'll see if I can take a look by this weekend.

Regarding the suggestions above:
Yeah I agree that Anki already provides a natural way to know the word's "maturity." I'm not sure how most normal users would be able to come up with a text file representing the same information but I could be wrong about this if that's how other software works.

Also agree that it makes sense to defer "mark word known" to a later change. Wonder if there's a clean solution that uses Anki so that all the data is one place.

ShanaryS · 2025-11-05T03:54:27Z

Also agree that it makes sense to defer "mark word known" to a later change. Wonder if there's a clean solution that uses Anki so that all the data is one place.

The only way I can think of is to create a Deck that asbplayer will add cards to. As there is no way to add a card without it being in a Deck. But it would just sit on the users collection and probably not the best. We could also just add it to the mining Deck as suspended, but this might interfere with user's workflows.

I think the best option is just to use localStorage or IndexedDB. We only need two thing per word, Word: Integer. Realistically, it would be like max 20 bytes for a single key value pair which is only 200 KB for 10,000 words. This would also be the same cache used with a user uploading a text file with known words so that implementation is free. I think as long as we export it (possibly to a different json), there is no real concern.

NovaKing001 · 2025-11-06T02:26:01Z

I'm not sure how most normal users would be able to come up with a text file representing the same information but I could be wrong about this if that's how other software works.

Programs like LUTE, LingQ, Migaku, and Bunpro allow users to export their known vocabulary. This gives users the ability to migrate their progress between platforms.

It doesn’t necessarily need to be a text file. For instance, Migaku has a feature where users can simply copy and paste their known word list.
Of course, this could be an issue for intermediate learners who rely exclusively on Anki to learn vocabulary. I would suggest keeping the known words based on Anki interval while also adding an option to import words directly from Anki.

An example from Migaku (courtesy of Jouzu Juls from YouTube)

Wonder if there's a clean solution that uses Anki so that all the data is one place.

The only other way I see fit, other than creating a deck where all your info is stored, is to create an Anki Addon where all that information is handled. However, that would be a major undertaking and quite a hassle to implement, which is why Id rather go for the solution I mentioned previously.

Heres a quick concept I made that would allow for the user to upload their known words list while also being able to import words from Anki

I have some other ideas but I think it would be better to create a separate issue that outlines it all and keeping this PR focused on implementing the Yomitan API. Thank you!

killergerbah · 2025-11-06T23:28:19Z

@NovaKing001 Thanks, I see now that we would benefit from being able to import word lists from other platforms. And also be able to export our own from asbplayer or Anki. I think the main decision to be made is where we store this word data, if not in Anki. @ShanaryS suggested IndexedDB or local storage. Of those two I would prefer IndexedDB, but I'm still not sure if I prefer that against having our own backend.

@ShanaryS I'm just starting to read the code so let me know if I'm misunderstanding anything. I just want to give some high-level design feedback as early as possible.

I see that color state is maintained by the extension. I think users should be able to use this feature from the app without the extension, and without having to load subtitles via the extension. For example, asbplayer app can easily be used standalone load both subtitles and/or video. I understand that your approach avoids CORS issues but if given the choice between installing the extension and configuring AnkiConnect settings to use this feature from the app, I think it's less friction to just configure AnkiConnect. Some suggestions for how this could work:
- The coloring logic can be extracted into the common workspace so that it can be used by both the app and extension independently.
- If the subtitles originate from the app side the app should color the subtitles. If they originate from the extension side, the extension should color the subtitles. You'll notice that we have some special logic triggered in Player.tsx depending on whether the subtitles are coming from extension or not. Similar logic could be used to implement the decision above.
I see that colors are requested separately from the subtitles themselves. Since the SubtitleModel already has an additional field coloredText I think the code will be simplified if the only state to maintain is the subtitles list itself. Which is to say, subtitles and the additional color state can be kept together without passing each one around separately.

ShanaryS · 2025-11-07T02:14:51Z

Of those two I would prefer IndexedDB, but I'm still not sure if I prefer that against having our own backend.

I've read your blog post about adding a backend to asbplayer. The features they would allow in a single package would be nice, but I'm not sure there is a huge market for it. To me, the biggest benefit of FOSS is that it's community driven and builds on top and with each other. IMO there is not a lot of people who are willing mine words and review them with flashcards but have a dealbreaker if it's not all in one app.

Eventually it would be nice to see, but I think such a thing is probably years away with many hours of work ahead. I think in the short term, being able to mark/import words as known and saving locally without adding to Anki would gain most of the value the meantime.

…r app

killergerbah

@ShanaryS I'm not going to request any more changes. I'll plan to merge in the next few days. I think there are some open questions still:

Should we disable sentence field targeting until we have a solution that's fast enough? A limit of 100 actually still seems too high. I've been waiting 15 minutes and haven't seen any subtitle get colorized.
I think even if we start using IndexedDB, tokenizing an entire deck might hours. At least on my computer, it can take seconds per tokenize request. We'll need a way to be able to tokenize sentences very quickly. Maybe we could concatenate sentences together...

ShanaryS · 2025-11-26T02:50:36Z

Should we disable sentence field targeting until we have a solution that's fast enough? A limit of 100 actually still seems too high. I've been waiting 15 minutes and haven't seen any subtitle get colorized.

It should never take that long for a single color, maybe something else is wrong. But I reduced it to a max of 10 now. I think it's better to keep it with a low limit than remove it. The IndexedDB will completely solve this at runtime, but this will still persist (though manageable) when building.

I think even if we start using IndexedDB, tokenizing an entire deck might hours. At least on my computer, it can take seconds per tokenize request. We'll need a way to be able to tokenize sentences very quickly. Maybe we could concatenate sentences together...

I've tried pretty much everything to improve it. Sending a single request with the concatenated text is ~3% slower and we lose the realtime updates from individual requests. I've also done numerous changes on Yomitan's side and the only real gain is using an LRU cache. If it's taking seconds, then I imagine your sentence fields have very long sentences? It's usually around 100-400ms for me.

Even if it takes hours to fully build the Anki status, it should be manageable as we only need to do it the first time. We wouldn't need a persistent Anki connection either and would easily be able to resume/background the work. Then we only have to do work on new/edited cards, review or suspension status won't affect this and would be a quick update for the changed cards. When using, the only time spent will be tokenizing/lemmatizing the subtitles. It will take minutes for a 3 hour movie, but we only need to do that all at once if were are calculating statistics.

Overall I think the final user experience will be perfectly acceptable and I think well understood by users. They only need the first time setup once and all future updates will be done in seconds. The subtitles are colored significantly faster than real-time where they only need to wait if we implement statistics (we could even have it update live as the value at 10% completion is likely similar to 100%).

I'm not going to request any more changes. I'll plan to merge in the next few days

I'll likely do a final review tomorrow. If I have anything else I'll let you know otherwise it's good from my side.

ShanaryS · 2025-11-26T20:06:43Z

I've been waiting 15 minutes and haven't seen any subtitle get colorized.

There was a race condition when updating the subtitles but it should only happen when there is a config error (e.g no Anki/Yomitan). If lots of colors are changed quickly none of the colors got updated for SubtitlePlayer. This may not be what happened, but I don't think it should ever take more than a couple seconds for the first one.

killergerbah · 2025-11-27T23:12:57Z

@ShanaryS It doesn't seem that surprising to me. At the previous cap of 100, you could query potentially 100 cards for each common word in a subtitle. Then you would need to tokenize each of those 100 cards. On my computer, it takes on average 5 seconds to tokenize the sentence of one card. That's (100 cards) * (N common words in sentence) * (5 seconds) ~ 500N seconds for a single sentence. Of course I'm assuming the Yomitan cache gets missed every time but with enough cards there would be a lot of misses.

I might be oversimplifying a bit, but my tokenize latencies look like this:

killergerbah · 2025-11-27T23:16:51Z

By the way, I'll plan to merge by tomorrow morning which is Saturday for me.

ShanaryS · 2025-11-27T23:20:38Z

I've actually made so more improvements to tokenize. It's now about 2.5x faster but also runs in parallel now. So depending where the bottleneck is for you it might actually not be that long.

killergerbah · 2025-11-27T23:22:43Z

I've tried pretty much everything to improve it. Sending a single request with the concatenated text is ~3% slower and we lose the realtime updates from individual requests. I've also done numerous changes on Yomitan's side and the only real gain is using an LRU cache. If it's taking seconds, then I imagine your sentence fields have very long sentences? It's usually around 100-400ms for me.

I might have installed too many dictionaries. My sentences aren't that long. I'll have to try experimenting later. Here's one that took 9 seconds:

{text: "誇り高き戦士であるこの私がかわいいとか、そんな浮ついた気持ちになったりしないんだからな", scanLength: 16}

Even if it takes hours to fully build the Anki status, it should be manageable as we only need to do it the first time. We wouldn't need a persistent Anki connection either and would easily be able to resume/background the work. Then we only have to do work on new/edited cards, review or suspension status won't affect this and would be a quick update for the changed cards. When using, the only time spent will be tokenizing/lemmatizing the subtitles. It will take minutes for a 3 hour movie, but we only need to do that all at once if were are calculating statistics.

Overall I think the final user experience will be perfectly acceptable and I think well understood by users. They only need the first time setup once and all future updates will be done in seconds. The subtitles are colored significantly faster than real-time where they only need to wait if we implement statistics (we could even have it update live as the value at 10% completion is likely similar to 100%).

Yeah agreed, if we can solve IndexedDb everything should be good. As a final resort we could always replace Yomitan API with anything else that implements lemmatize and tokenize.

ShanaryS · 2025-11-28T03:54:58Z

I might have installed too many dictionaries. My sentences aren't that long. I'll have to try experimenting later. Here's one that took 9 seconds:

This took 333ms for me which lines up with this kind of length. I have 20 dictionaries enabled and back when I tested it weeks ago it didn't make much of a difference.

I've also made enough improvements to Yomitan to get the tokenize + lemmatize from 5m18s to 2m5s for a 3 hour movie subtitle. I don't think there is much else to do without true multi-threading or an algorithm change. I ideally would have liked 30s or less but this is much more palatable.

JSchoreels · 2025-11-28T08:51:05Z

I've tried pretty much everything to improve it. Sending a single request with the concatenated text is ~3% slower and we lose the realtime updates from individual requests. I've also done numerous changes on Yomitan's side and the only real gain is using an LRU cache. If it's taking seconds, then I imagine your sentence fields have very long sentences? It's usually around 100-400ms for me.

I might have installed too many dictionaries. My sentences aren't that long. I'll have to try experimenting later. Here's one that took 9 seconds:
{text: "誇り高き戦士であるこの私がかわいいとか、そんな浮ついた気持ちになったりしないんだからな", scanLength: 16}
Even if it takes hours to fully build the Anki status, it should be manageable as we only need to do it the first time. We wouldn't need a persistent Anki connection either and would easily be able to resume/background the work. Then we only have to do work on new/edited cards, review or suspension status won't affect this and would be a quick update for the changed cards. When using, the only time spent will be tokenizing/lemmatizing the subtitles. It will take minutes for a 3 hour movie, but we only need to do that all at once if were are calculating statistics.
Overall I think the final user experience will be perfectly acceptable and I think well understood by users. They only need the first time setup once and all future updates will be done in seconds. The subtitles are colored significantly faster than real-time where they only need to wait if we implement statistics (we could even have it update live as the value at 10% completion is likely similar to 100%).

Yeah agreed, if we can solve IndexedDb everything should be good. As a final resort we could always replace Yomitan API with anything else that implements lemmatize and tokenize.

Locally I'm running Yomitan with Mecab installed and I can tokenize huge documents in matter of seconds.

CleanShot.2025-11-28.at.09.36.39.mp4

Basically we're talking ˜2-3ms by block (~128 chars for now)

If we take the comparison of the "simple" and "mecab" tokenizer inside Yomitan, we can process the full Oppenheimer subtitles file in about 6s instead of around 1.5 minutes (by bulking)

# SIMPLE
Summary:
  total blocks processed: 11
  total subtitle entries: 3243
  total API time: 95978.8 ms
  avg per block : 8725.3 ms
  wall-clock     : 95994.0 ms
  overall avg ratio (subtime/proc): 111.922
# MECAB
Summary:
  total blocks processed: 11
  total subtitle entries: 3243
  total API time: 601.2 ms
  avg per block : 54.7 ms
  wall-clock     : 617.1 ms
  overall avg ratio (subtime/proc): 17575.056

The main difference is the fact that Yomitan Simple tokenize do explore from left to right the text by bruteforcing its way through all potential conjugations. Instead, mecab gives me already something like this :

もう一度、聞くわ。──どうして私を、『嫉妬の魔女』の名で呼ぶの
もう一度	副詞,一般,*,*,*,*,もう一度,モウイチド,モーイチド
、	記号,読点,*,*,*,*,、,、,、
聞く	動詞,自立,*,*,五段・カ行イ音便,基本形,聞く,キク,キク
わ	助詞,終助詞,*,*,*,*,わ,ワ,ワ
。	記号,句点,*,*,*,*,。,。,。
─	記号,一般,*,*,*,*,─,─,─
─	記号,一般,*,*,*,*,─,─,─
どうして	副詞,一般,*,*,*,*,どうして,ドウシテ,ドーシテ
私	名詞,代名詞,一般,*,*,*,私,ワタシ,ワタシ
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
、	記号,読点,*,*,*,*,、,、,、
『	記号,括弧開,*,*,*,*,『,『,『
嫉妬	名詞,サ変接続,*,*,*,*,嫉妬,シット,シット
の	助詞,連体化,*,*,*,*,の,ノ,ノ
魔女	名詞,一般,*,*,*,*,魔女,マジョ,マジョ
』	記号,括弧閉,*,*,*,*,』,』,』
の	助詞,連体化,*,*,*,*,の,ノ,ノ
名	名詞,一般,*,*,*,*,名,ナ,ナ
で	助詞,格助詞,一般,*,*,*,で,デ,デ
呼ぶ	動詞,自立,*,*,五段・バ行,基本形,呼ぶ,ヨブ,ヨブ
の	助詞,終助詞,*,*,*,*,の,ノ,ノ

That you need to interpret a little bit, but which is quite simple processing to get the same results as mecab simple tokenizer :

                        const shouldMerge = (
                            // 助動詞 or 動詞-接尾 (but not after 記号)
                            ((tokenPos === '助動詞' || (tokenPos === '動詞' && tokenPos2 === '接尾')) && last_token.pos !== '記号') ||
                            // て/で particle after verb
                            (tokenPos === '助詞' && tokenPos2 === '接続助詞' && (term === 'て' || term === 'で') && last_token.pos === '動詞')
                        );
                        if (shouldMerge) {
                            line.pop();
                            term = last_token.term + term;
                            reading = last_token.reading + reading;
                            source = last_token.source + source;
                        }

https://github.com/yomidevs/yomitan/blob/9701ef241b29d23e0ed96d77ad9ccae4f628fc6c/ext/js/comm/mecab.js#L226-L236

This then gives you something like this

Testing Parsing for sentence: この世界の片隅に
mecab      : この|世界|の|片隅|に
simple     : この世|界|の|片隅|に

Testing Parsing for sentence: ぐらい上目遣いで言った方がやる気出るぜ？
mecab      : ぐらい|上目遣い|で|言った|方|が|やる気|出る|ぜ|？
simple     : ぐらい|上目遣い|で|言った|方|がや|る|気|出る|ぜ|？

Testing Parsing for sentence: 奇襲でもされたときに君が真っ先にやられると全滅確定
mecab      : 奇襲|で|も|された|とき|に|君|が|真っ先|に|やられる|と|全滅|確定
simple     : 奇襲|でも|された|ときに|君|が|真っ先に|やられる|と|全滅|確定

Testing Parsing for sentence: でも、頑張って
mecab      : でも|、|頑張って
simple     : でも|、|頑張って

Testing Parsing for sentence: そっかそっか。ならま、いいんじゃねーかな
mecab      : そっ|か|そっ|か|。|なら|ま|、|いい|ん|じゃ|ねー|か|な
simple     : そっか|そっか|。|なら|ま|、|いいん|じゃねー|かな

Testing Parsing for sentence: そっかそっか
mecab      : そっ|か|そっ|か
simple     : そっか|そっか

Testing Parsing for sentence: もう一度、聞くわ。──どうして私を、『嫉妬の魔女』の名で呼ぶの
mecab      : もう一度|、|聞く|わ|。|─|─|どうして|私|を|、|『|嫉妬|の|魔女|』|の|名|で|呼ぶ|の
simple     : もう一度|、|聞く|わ|。──|どうして|私|を|、『|嫉妬|の|魔女|』|の|名|で|呼ぶ|の

Testing Parsing for sentence: 立ち止まった少女に人混みをかき分けて歩み寄り
mecab      : 立ち止まった|少女|に|人混み|を|かき分けて|歩み寄り
simple     : 立ち止まった|少女|に|人混み|を|かき分けて|歩み寄り

Testing Parsing for sentence: Jonathanです
mecab      : Jonathan|です
simple     : Jonathan|です

So you see, small differences like how そっか becomes そっ-か, but that doesn't block the user to still get that そっか as a lookup results (or when we do a searchTerms on it).

I'm currently also using this branch locally and plugged my asbplayer to this yomitan branch. Unfortunately, to make this work for the main branch, this PR need to be first merged : yomidevs/yomitan-mecab-installer#11 since mecab integration is just not working at all for the moment.

Once this one is merged, I could then propose my previous PR to Yomitan itself.

Note, that if this is too heavy as a setup (which it is in my opinion), similar results could be obtained by directly embedding in asbplayer Tokenizer libraries like kuromoji https://github.com/takuyaa/kuromoji.js. Lookups would still have to be done to Yomitan of course, since the dictionaries are there, but the /tokenize doesn't need those if you use a Tokenizer like mecab/kuromoji.

For now, I'm already happy doing that on my own build, but if it's something that interests you or want some help trying to integrate things like that in the future, feel free to ping me :)

killergerbah · 2025-11-29T00:16:45Z

@JSchoreels I see, so MeCab is way faster. Hope to see your work merged soon. Also hoping that that's the problem I'm experiencing. There's I'm seeing a wide distribution of latencies on my laptop, which makes me think that there's another problem (besides Yomitan) as well.

ShanaryS force-pushed the yomitan-anki branch 9 times, most recently from 7155a24 to ec3818c Compare October 31, 2025 17:56

ShanaryS force-pushed the yomitan-anki branch 2 times, most recently from 0e61878 to 5da3281 Compare November 1, 2025 07:42

ShanaryS force-pushed the yomitan-anki branch from 5da3281 to c2e632f Compare November 1, 2025 15:52

ShanaryS marked this pull request as ready for review November 1, 2025 15:52

ShanaryS force-pushed the yomitan-anki branch from c2e632f to cd510db Compare November 4, 2025 02:39

ShanaryS force-pushed the yomitan-anki branch 2 times, most recently from 0f21ee4 to f48f325 Compare November 5, 2025 03:39

ShanaryS force-pushed the yomitan-anki branch from f48f325 to 5b21081 Compare November 6, 2025 01:08

ShanaryS force-pushed the yomitan-anki branch from 5b21081 to 69b66e5 Compare November 6, 2025 19:05

ShanaryS force-pushed the yomitan-anki branch 2 times, most recently from f99c299 to b820d9f Compare November 7, 2025 16:14

ShanaryS added 5 commits November 24, 2025 18:34

remove extension/app specific settings, simplify subtitle coloring fo…

547893c

…r app

add background/outline styling and support 5 known statuses

8836f9a

simplify yomitan and dictionaryTracks usage

5366db5

refactor SubtitleColoring to extend SubtitleCollection

e0ffc07

cleanup subtitle coloring usage

035caee

ShanaryS force-pushed the yomitan-anki branch 2 times, most recently from 8879358 to 37f1ffd Compare November 24, 2025 23:39

ShanaryS added 2 commits November 24, 2025 19:28

remove anki enabled setting, simplify lemma check, cache anki lookups

62f1259

limit number of sentence cards that are processed

a3733ff

ShanaryS force-pushed the yomitan-anki branch from 37f1ffd to a3733ff Compare November 25, 2025 01:17

killergerbah approved these changes Nov 25, 2025

View reviewed changes

only re-render changed subtitle rows

707efed

ShanaryS force-pushed the yomitan-anki branch from 9f46151 to d452c25 Compare November 26, 2025 19:58

fix race condition when updating multiple subtitles' rich text

7213e96

ShanaryS force-pushed the yomitan-anki branch from d452c25 to 7213e96 Compare November 26, 2025 20:04

fix styling for multi-line subtitles with VideoPlayer

2c33251

killergerbah merged commit 24a1d1a into killergerbah:main Nov 29, 2025
1 check passed

ShanaryS deleted the yomitan-anki branch November 29, 2025 00:18

killergerbah added this to the Extension v1.14.0 milestone Nov 29, 2025

ShanaryS mentioned this pull request Nov 29, 2025

Speed up tokenization for heavy workflows yomidevs/yomitan#2251

Merged

JSchoreels mentioned this pull request Dec 10, 2025

Feature/mecab [/tokenize] support for mecab yomidevs/yomitan#2254

Merged

2 tasks

Conversation

ShanaryS commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design:

UI/Options

Next PR - Using IndexedDB:

Future PRs:

Uh oh!

NovaKing001 commented Nov 1, 2025

Uh oh!

ShanaryS commented Nov 1, 2025

Uh oh!

NovaKing001 commented Nov 1, 2025

Uh oh!

artjomsR commented Nov 2, 2025

Uh oh!

ShanaryS commented Nov 2, 2025

Uh oh!

killergerbah commented Nov 4, 2025

Uh oh!

ShanaryS commented Nov 5, 2025

Uh oh!

NovaKing001 commented Nov 6, 2025

Uh oh!

killergerbah commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShanaryS commented Nov 7, 2025

Uh oh!

killergerbah left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShanaryS commented Nov 26, 2025

Uh oh!

ShanaryS commented Nov 26, 2025

Uh oh!

killergerbah commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

killergerbah commented Nov 27, 2025

Uh oh!

ShanaryS commented Nov 27, 2025

Uh oh!

killergerbah commented Nov 27, 2025

Uh oh!

ShanaryS commented Nov 28, 2025

Uh oh!

JSchoreels commented Nov 28, 2025

Uh oh!

Uh oh!

killergerbah commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ShanaryS commented Oct 29, 2025 •

edited

Loading

Next PR - Using `IndexedDB`:

killergerbah commented Nov 6, 2025 •

edited

Loading

killergerbah left a comment •

edited

Loading

killergerbah commented Nov 27, 2025 •

edited

Loading