Allow collecting tokens with no dictionary entry by ShanaryS · Pull Request #929 · killergerbah/asbplayer

ShanaryS · 2026-03-08T04:55:50Z

fixes #927

The Yomitan parser simply tries to find the longest dictionary entry possible left to right in order to tokenize. It's possible for some tokens to be non-dictionary entries (until the next dictionary entry-able set of characters appear) which of course won't have any lemmas when we look for them. These entries should just be created with their own token as a lemma to allow the user to collect them.

I modified the strategy logic to fall back to exact if the lemma is missing. We cannot just always add the token as the lemma as we want the failure in all other use cases of lemmatize().

cloudflare-workers-and-pages · 2026-03-08T04:58:53Z

Deploying asbplayer with Cloudflare Pages

Latest commit:	`fb2e7ed`
Status:	✅ Deploy successful!
Preview URL:	https://7776cebf.asbplayer.pages.dev
Branch Preview URL:	https://collect-ungrouped-segments.asbplayer.pages.dev

View logs

common/subtitle-coloring/subtitle-coloring.ts

killergerbah · 2026-03-08T06:42:41Z

I just tested this with the Japanese YT video mentioned in the original issue and the ー character in ずーっと is not being rendered on this version of the code

killergerbah · 2026-03-08T06:45:49Z

Ah never mind I'm idiot

NovaKing001 · 2026-03-08T07:01:09Z

Yeah that was a miscommunication on my part I made those subtitles using whisper.cpp lol

ShanaryS self-assigned this Mar 8, 2026

ShanaryS added the bug Something isn't working label Mar 8, 2026

ShanaryS force-pushed the collect-ungrouped-segments branch from fd01470 to 9103dcf Compare March 8, 2026 04:58

ShanaryS force-pushed the collect-ungrouped-segments branch from 9103dcf to 04955da Compare March 8, 2026 06:00

allow collecting tokens with no dictionary entry

fb2e7ed

ShanaryS force-pushed the collect-ungrouped-segments branch from 04955da to fb2e7ed Compare March 8, 2026 06:19

killergerbah reviewed Mar 8, 2026

View reviewed changes

common/subtitle-coloring/subtitle-coloring.ts Show resolved Hide resolved

killergerbah approved these changes Mar 8, 2026

View reviewed changes

killergerbah merged commit e357624 into main Mar 8, 2026
2 checks passed

killergerbah deleted the collect-ungrouped-segments branch March 8, 2026 06:45

killergerbah added this to the Extension v1.15.0 milestone Mar 8, 2026

ShanaryS mentioned this pull request Mar 8, 2026

Consistent handling of no token lemmas #930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow collecting tokens with no dictionary entry#929

Allow collecting tokens with no dictionary entry#929
killergerbah merged 1 commit intomainfrom
collect-ungrouped-segments

ShanaryS commented Mar 8, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

killergerbah commented Mar 8, 2026

Uh oh!

killergerbah commented Mar 8, 2026

Uh oh!

Uh oh!

NovaKing001 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShanaryS commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying asbplayer with Cloudflare Pages

Uh oh!

Uh oh!

killergerbah commented Mar 8, 2026

Uh oh!

killergerbah commented Mar 8, 2026

Uh oh!

Uh oh!

NovaKing001 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShanaryS commented Mar 8, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 8, 2026 •

edited

Loading