Skip to content

mcgrizzz/Yomine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

240 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

yomine

GitHub Workflow Status (with event) Github All Releases license: Apache license: MIT

Yomine

A Japanese vocabulary mining tool for extracting the most useful words from real content.

It analyzes subtitle files, integrates with ASBPlayer/MPV for direct video navigation, provides flexible ranking and filtering, and automatically hides words you already know using Anki.

Written in Rust πŸ¦€

Usage GIF


Status

🚧 This project is under active development and may be buggy.
The macOS and Linux binaries have not been extensively tested.

Quick Start

  1. Download the latest release for your platform from Releases
  2. Connect to Anki - Anki Setup
  3. Connect to a Video Player - Either:
    • ASBPlayer: In ASBPlayer, MISC -> Enable WebSocket client
    • MPV: Start MPV with --input-ipc-server=/tmp/mpv-socket

That's it! Yomine will segment the text, rank terms by frequency, and show you vocabulary and expressions to learn.

Features

  • Vocabulary extraction from Japanese subtitle files (words and expressions)
  • Frequency-based ranking to prioritize terms
  • Anki integration to filter out words you already know
  • Video player integration (ASBPlayer and MPV) for timestamp navigation
  • Term analysis with readings, part-of-speech, and context sentences
  • Multi-sentence browsing to see multiple example sentences per term
  • Ignore list to hide unwanted terms from your mining results
  • Comprehensibility scoring - Sentence difficulty estimation based on your Anki card intervals
  • Advanced filtering - Filter vocabulary by part-of-speech and frequency ranges
  • Dictionary weighting - Customize which frequency sources are prioritized
  • Sorting and searching - Sort by frequency, chronological order, sentence count, or comprehension level; search for specific terms
  • Multiple subtitle formats - Supports SRT, ASS, and SSA subtitle files
  • Frequency Analyzer Tool - Generate your own frequency dictionaries.

Installation

Download Prebuilt Binary (Recommended)

  1. Go to Releases
  2. Download the appropriate file for your system:
    • Windows: yomine-*-windows-x64.exe
    • macOS: yomine-*-macos-universal (Intel & Apple Silicon)
    • Linux: yomine-*-linux-x64
  3. Run the executable

Configuration

Setting Up Frequency Dictionaries

Yomine uses frequency dictionaries to rank vocabulary by importance and improve text segmentation. It will automatically download JPDB v2.2 Frequency Kana. Though you can add as many as you like, toggle and weigh them however you like.

Adding Dictionaries:

  1. In Yomine, go to File β†’ Load New Frequency Dictionaries
  2. Select zip files containing Yomitan-compatible frequency dictionaries
  3. Restart when prompted

Recommended Dictionaries:

  • JPDB v2.2 Frequency Kana: β˜… Automagically downloaded and installed β˜…
  • BCCWJ: Based on the Balanced Corpus of Contemporary Written Japanese
  • CC100: List from Common Crawl data

More dictionaries: Marv's collection and Shoui's collection

Generate Your Own:

You can also generate your own custom frequency dictionaries directly inside Yomine using the built-in Frequency Analyzer tool.

Note: Always download frequency dictionaries from trusted sources to avoid corrupted or malicious files. If you can't find a specific dictionary, consider generating your own. You may want to ask around on the TMW Discord as well.

Setting Up Anki Integration

Yomine connects to Anki to filter out terms you already know.

Prerequisites:

  1. Install the AnkiConnect add-on in Anki
    • In Anki: Tools β†’ Add-ons β†’ Get Add-ons β†’ Enter code 2055492159
    • Restart Anki

Configuration:

  1. In Yomine: Settings β†’ Anki Settings

  2. Wait for connection to establish

  3. For each note type:

    • Select from dropdown
    • Choose Term Field (Japanese word/phrase)
    • Choose Reading Field (pronunciation)
    • Click "Add" to save mapping

    Note: Yomine will try to guess the correct fields for you

Anki Setup

Configuring WebSocket Connection

Yomine uses WebSocket to communicate with ASBPlayer for timestamp navigation.

Default Setup:

  • Yomine runs WebSocket server on port 8766
  • In ASBPlayer: MISC β†’ Enabled WebSocket Client

Changing the Port:

  1. In Yomine: Settings β†’ WebSocket Settings
  2. Change the port to something else (8767, 8768, 1111, 5353, etc)
  3. Click "Save and Restart Server"
  4. In ASBPlayer: MISC β†’ WebSocket Server URL β†’ enter ws://localhost:YOUR_PORT

MPV Player Integration

Yomine can also integrate directly with MPV player for timestamp navigation, providing an alternative to ASBPlayer.

Setup:

  1. Start MPV with IPC server enabled:

    mpv --input-ipc-server=/tmp/mpv-socket your-video-file.mkv
  2. Yomine will automatically detect when MPV is running and switch to MPV mode

  3. When MPV is detected, the WebSocket server will be automatically stopped

  4. When MPV is closed, Yomine will automatically restart the WebSocket server for ASBPlayer

Note: You can add input-ipc-server=/tmp/mpv-socket to your MPV configuration file to enable IPC by default.

Managing Your Ignore List

The ignore list lets you hide terms you don't want to see from your mining results.

Adding Terms to Ignore List:

  1. Right-click any term in the main vocabulary table
  2. Select "Add to Ignore List"
  3. The term will be hidden from future mining sessions

Managing the Ignore List:

  1. Go to Settings β†’ Ignore List Settings
  2. View all ignored terms in the list
  3. Remove terms by clicking the red "x"

Roadmap

Completed

  • Anki Integration Customization
  • Prebuilt Binaries
  • Multi-Sentence Browsing - View multiple example sentences per term
  • Ignore List - Hide unwanted terms from mining results
  • Comprehensibility Scoring - Sentence difficulty estimation based on Anki intervals
  • Advanced Filtering - Filter by part-of-speech and frequency ranges
  • Custom Frequency Lists: Generate dictionaries from your own content

Planned

  • Improved Segmentation: Better text parsing and part-of-speech tagging
  • More File Types: Support for eBooks, web pages, etc.

FAQ

What is vocabulary mining?

It's the process of extracting unknown words and expressions from native content (videos, books, etc.) to create targeted study materials. This approach focuses on vocabulary that's relevant to content you want to understand, rather than studying random word lists.

How should I use this tool?

I prefer post-input mining: after watching a video or episode, I add it to a todo list. Then, whenever I have time, I can review the content and extract terms I want to add to my Anki mining deck. This helps me stay focused on enjoying the content while watching, knowing I can come back to mine vocabulary later.

Yomine?

The name comes from θͺ­γΏ ("yomi" for reading) + "mine" (as in mining vocabulary).

Building from Source

Prerequisites:

Steps:

git clone https://github.com/mcgrizzz/Yomine.git
cd yomine
cargo build --release
cargo run --release

License

Yomine is licensed under MIT OR Apache-2.0

Author and maintainer: @mcgrizzz

Key Dependencies:


Happy Mining! ⛏️ ι ‘εΌ΅γ‚ŠγΎγ—γ‚‡γ†οΌ

About

A Japanese vocabulary mining tool designed to help language learners mine new words and expressions.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Sponsor this project

 

Languages