Skip to content

macos : fix installation to correctly target the installation folder for mecab.py and dictionnaries#11

Merged
Kuuuube merged 1 commit intoyomidevs:masterfrom
JSchoreels:fix/mecab-empty-result-relative-data-path
Dec 26, 2025
Merged

macos : fix installation to correctly target the installation folder for mecab.py and dictionnaries#11
Kuuuube merged 1 commit intoyomidevs:masterfrom
JSchoreels:fix/mecab-empty-result-relative-data-path

Conversation

@JSchoreels
Copy link
Copy Markdown

@JSchoreels JSchoreels commented Nov 19, 2025

By running experiment with Yomitan API, I noticed how changing from mecab or not wasn't changing anything and when checking the code of Yomitan itself, I noticed it should have parsed differently sentence.

After investigation, I realized the mecab integration while being "Successfuly connected" was in fact returning empty arrays.

After troubleshooting, I realized it was in fact the DIR that was set to the relative file of mecab.py, but thus it wouldn't find the dictionnary that were downloaded in the cloned repository.

By adding a env variable, and also suggesting to used to just hardcode it if it's easier for them, I was able to make parsing work differently

Before : See how the parsed results is an empty array
CleanShot 2025-11-19 at 10 21 03@2x

After, now the result is filled correctly
CleanShot 2025-11-19 at 11 30 42@2x

@YukiNagat0
Copy link
Copy Markdown

YukiNagat0 commented Nov 29, 2025

I can not reproduce the issue that you are describing in this PR.
On Windows everything works as expected.

Screenshot 2025-11-22 202402

(screenshot of the dev tools on main branch)

@YukiNagat0
Copy link
Copy Markdown

YukiNagat0 commented Nov 30, 2025

I just don't understand why you need to change the value of the DIR variable.
The only purpose of it is to point to the directory where install_mecab_for_yomitan.py, mecab.py and other files are located. Among them is the data directory with a dictionary that is used for the mecab.

@Kuuuube
Copy link
Copy Markdown
Member

Kuuuube commented Nov 30, 2025

It is worth noting that this PR does not force you to do this setup. It should fall back on the previous behavior.

Personally I dont have a problem allowing this in but I am a bit worried about adding complication to the instructions since it may not be necessary. If you can provide some info on what system you're on that might be helpful so we can maybe point out to a more specific range of users that may need this.

@YukiNagat0
Copy link
Copy Markdown

YukiNagat0 commented Nov 30, 2025

Problem with this PR (env part of it) is the fact that it does nothing: DIR still should point to the directory with the following structure:

.
 |-.gitignore
 |-data
 | |-unidic-mecab-translate
 | | |-BSD
 | | |-char.bin
 | | |-COPYING
 | | |-dicrc
 | | |-GPL
 | | |-LGPL
 | | |-matrix.bin
 | | |-sys.dic
 | | |-unk.dic
 |-demo.gif
 |-install_mecab_for_yomitan.py
 |-LICENSE
 |-mecab.py
 |-mecabrc
 |-mecab_yomitan.bat
 |-README.md
 |-yomitan_mecab.json

Then what's the point to use the env variable if you already have this directory after cloning the repository?

@YukiNagat0
Copy link
Copy Markdown

YukiNagat0 commented Nov 30, 2025

If you want to store dictionary data somewhere else then you should use symbolic link.

@JSchoreels
Copy link
Copy Markdown
Author

JSchoreels commented Nov 30, 2025

Well on my end, each them time a call was made to mecab, it was returning an empty array. After a few hours of debugging, I finally realized it was looking for the dictionnary plugin in the same folder as DIR = os.path.realpath(os.path.dirname(__file__))

Which later in the script is looped over to find dictionnaries

   def start_mecabs(self):
        for dictionary_name in Mecab.dictionaries:
            if os.path.isdir(os.path.join(DIR, 'data', dictionary_name)):
                self.mecabs[dictionary_name] = Mecab(dictionary_name)

Thing is, mecab.py that is executed is not in this repository folder, it is the one copied in the chrome-extension folder.

    if platform_data['platform'] == 'mac':
        script_path = os.path.join(manifest_install_data['path'], 'mecab.py')
        try:
            shutil.copy(os.path.join(DIR, 'mecab.py'), script_path)

Meaning, it finds no "mecabs", and thus no tokenizing succeed.

What have you tested exactly @YukiNagat0 ? Because if you mean looking up words works with mecab enabled, it will indeed work because it is in fact fallbacking to the simple method if no results are coming back from mecab. But in my case, in the /tokenize endpoint, I do take the result of mecab and I was getting empty arrays again and again. (By default, the /tokenize endpoint will never use mecab, but that's part of my next PR to Yomitan to allow tokenizing through mecab, see branch below)

Maybe you could check if, in the extension directory, you see the data with the dictionaries? In my case, those get downloaded in the repo folder while the mecab.py is copied in chrome extension folder, which of course breaks how it scan for dictionaries when executing it. If copying those dictionaries in chrome's folder is the expected behaviour in Windows, we could also just fix the issue like that.

It was pretty difficult to debug because the logs where thrown in the stderr of chrome, so to debug it I had to do print on stderr (stdout won't be logged by chrome output)

Cf screenshots.

By the way, to make the /tokenize work with mecab, you'd also need this branch :
https://github.com/JSchoreels/yomitan/tree/feature/mecab-tokenizer-improvements
And this script to test the endpoint with both mecab/parser to see how they might differ

tokenize_test.py

@JSchoreels
Copy link
Copy Markdown
Author

@YukiNagat0 , if after test you see this issue is not happening in Windows (That the data folder is found correctly), and thus the issue being more about macos handling, I can do changes only to the macos logic by doing this diff instead :

diff --git a/install_mecab_for_yomitan.py b/install_mecab_for_yomitan.py
index 05d83af..915e614 100755
--- a/install_mecab_for_yomitan.py
+++ b/install_mecab_for_yomitan.py
@@ -21,7 +21,6 @@ import os
 import json
 import copy
 import zipfile
-import shutil
 if sys.version_info[0] == 3:
     from urllib.request import urlretrieve
 elif sys.version_info[0] == 2:
@@ -241,7 +240,7 @@ def main():
     if platform_data['platform'] == 'mac':
         script_path = os.path.join(manifest_install_data['path'], 'mecab.py')
         try:
-            shutil.copy(os.path.join(DIR, 'mecab.py'), script_path)
+            os.symlink(os.path.join(DIR, 'mecab.py'), script_path)
             print(f"File copied from {os.path.join(DIR, 'mecab.py')} to {script_path}")
         except FileNotFoundError:
             print("File not found.")
diff --git a/mecab.py b/mecab.py
index 85c2a5c..bcaf0a7 100755
--- a/mecab.py
+++ b/mecab.py
@@ -34,7 +34,8 @@ elif sys.version_info[0] == 2:
     import Queue as queue
     from itertools import izip_longest as zip_longest

-DIR = os.path.realpath(os.path.dirname(__file__))
+# First resolve __file__ itself (in case mecab.py is a symlink), then get its directory
+DIR = os.path.dirname(os.path.realpath(__file__))

@YukiNagat0
Copy link
Copy Markdown

Well, I don't have a mac machine, but I am sure that it is mac specific issue.
To be on the same page I will describe what install_mecab_for_yomitan.py should do (on linux/macos):

  1. Create NativeMessagingHosts/native-messaging-hosts folder
  2. In this directory create yomitan_mecab.json manifest file that has path key that should point to mecab.py in the cloned repo folder where install_mecab_for_yomitan.py and other files are located.
  3. install in the cloned repo folder the dictionary data in the ./data/dict_name/ folder

On windows the scheme is almost identical:

  1. Create the mecab_yomitan.bat file in the cloned repo folder (work around to call mecab.py located in the same cloned repo folder)
  2. Create the yomitan_mecab.json manifest file, again, in the same folder
  3. Create the registry entry that points to yomitan_mecab.json

Now, why it doesn't work on macos .
Problem is in the following lines:

# fix macOS user dictionary permission issue
if platform_data['platform'] == 'mac':
script_path = os.path.join(manifest_install_data['path'], 'mecab.py')
try:
shutil.copy(os.path.join(DIR, 'mecab.py'), script_path)
print(f"File copied from {os.path.join(DIR, 'mecab.py')} to {script_path}")

So on macos we have broken (half-backed) setup:
In the NativeMessagingHosts folder we have copied mecab.py file and created yomitan_mecab.json manifest file with path key that points to the NativeMessagingHosts/mecab.py.

However, ./mecabrc and ./data/dict_name/ are still in the cloned repo folder.


Now the question: how to solve the issue for macos.
(Again, I don't have a mac machine so I can't test anything)
My proposal is to resolve the # fix macOS user dictionary permission issue without copying the mecab.py into NativeMessagingHosts and changing script_path. So deleting the following lines altogether:

# fix macOS user dictionary permission issue
if platform_data['platform'] == 'mac':
script_path = os.path.join(manifest_install_data['path'], 'mecab.py')
try:
shutil.copy(os.path.join(DIR, 'mecab.py'), script_path)
print(f"File copied from {os.path.join(DIR, 'mecab.py')} to {script_path}")
except FileNotFoundError:
print("File not found.")
except PermissionError:
print("Permission denied.")
except Exception as e:
print(f"An error occurred: {e}")

If macos requires that script needs to be placed in the NativeMessagingHosts folder then we should copy (or move) all files that are needed for mecab to work (./mecabrc and ./data/dict_name/) into the NativeMessagingHosts folder.

@YukiNagat0
Copy link
Copy Markdown

YukiNagat0 commented Nov 30, 2025

So the goal (on macos) is to find a way where we only create the NativeMessagingHosts folder with yomitan_mecab.json manifest that points to the cloned repo folder where all files are located (without copying anything or changing any paths and DIR variables).

Or (if it is impossible) copy/move everything to NativeMessagingHosts folder

Making the mecab.py a symlink will not solve the issue because (if I am not mistaken) DIR (__file__) will still be resolved into the folder where symlink is located (NativeMessagingHosts) and not into the folder where it points to (cloned repo folder)

@JSchoreels
Copy link
Copy Markdown
Author

JSchoreels commented Nov 30, 2025

Making the mecab.py a symlink will not solve the issue because (if I am not mistaken) DIR (__file__) will still be resolved into the folder where symlink is located (NativeMessagingHosts) and not into the folder where it points to (cloned repo folder)

It does because of this os.path.realpath(__file__) in DIR = os.path.dirname(os.path.realpath(__file__)) .

>>> os.path.realpath(os.path.dirname('mecab.py'))
'/Users/jschoreels/Library/Application Support/Google/Chrome/NativeMessagingHosts'
>>> os.path.dirname(os.path.realpath('mecab.py'))
'/Users/jschoreels/workspace/yomitan-mecab-installer'

Basically first get the realpath then the dirname, instead of doing the dirname then the realpath.

I'm testing it right now and it seems to work fine. I wonder if it's not the simplest solution here since mecab.py just have to know where to look and it gives it the right location.

@JSchoreels
Copy link
Copy Markdown
Author

Well actually putting the right link to the yomitan_mecab.json just work as fine :

{
    "name": "yomitan_mecab",
    "description": "MeCab for Yomitan",
    "type": "stdio",
    "path": "/Users/jschoreels/workspace/yomitan-mecab-installer/mecab.py",
    "allowed_origins": [
        "chrome-extension://likgccmbimhjbgkjambclfkhldnlhbnn/",
        "chrome-extension://glnaenfapkkecknnmginabpmgkenenml/",
        "chrome-extension://igbfpblkdooilgjjadkohgmcandjdmnf/"
    ]
}

So I guess we can maybe use this approach to make it consistent across systems ?

@YukiNagat0
Copy link
Copy Markdown

Well actually putting the right link to the yomitan_mecab.json just work as fine

So, the solution is to delete these lines, right?

# fix macOS user dictionary permission issue
if platform_data['platform'] == 'mac':
script_path = os.path.join(manifest_install_data['path'], 'mecab.py')
try:
shutil.copy(os.path.join(DIR, 'mecab.py'), script_path)
print(f"File copied from {os.path.join(DIR, 'mecab.py')} to {script_path}")
except FileNotFoundError:
print("File not found.")
except PermissionError:
print("Permission denied.")
except Exception as e:
print(f"An error occurred: {e}")

@JSchoreels
Copy link
Copy Markdown
Author

Well the path generated in the file is incorrent :

{
    "name": "yomitan_mecab",
    "description": "MeCab for Yomitan",
    "type": "stdio",
    "path": "/Users/jschoreels/Library/Application Support/Google/Chrome/NativeMessagingHosts/mecab.py",
    "allowed_origins": [
        "chrome-extension://likgccmbimhjbgkjambclfkhldnlhbnn/",
        "chrome-extension://glnaenfapkkecknnmginabpmgkenenml/",
        "chrome-extension://igbfpblkdooilgjjadkohgmcandjdmnf/"
    ]
}

So I'll quickly start from scratch and remove the copy (the lines you mentionned) and put the right path in that json

@YukiNagat0
Copy link
Copy Markdown

put the right path in that json

It should be already the right path: script_path = os.path.join(DIR, 'mecab.py')

…json. Copying mecab.py is not needed because of that
@JSchoreels JSchoreels force-pushed the fix/mecab-empty-result-relative-data-path branch from 62025d7 to bb2f8ea Compare November 30, 2025 12:55
@JSchoreels
Copy link
Copy Markdown
Author

JSchoreels commented Nov 30, 2025

The script_path was overridden with one of the lines you removed

        'manifest_install_data': {
[...]
            'chrome': {
                'methods': ['file'],
                'path': os.path.expanduser('~/.config/google-chrome/NativeMessagingHosts/'),
            },
        }
[...]

But thus, simply removing those 12 lines were in fact sufficient.

Thanks a lot for the investigation and to have taken time to explain how it was meant to work, next time I'll ping you first I guess :)

I've pushed force the new patch

I wonder exactly what was the permission issue the previous contributor faced, it seems to work OK here. I thought about maybe he had Google installed as a "global app" of somekind, but then copying mecab.py wouldn't have worked either...

@JSchoreels JSchoreels changed the title Use env variable to be sure the chrome script will target the dictionary folder macos : Fix installation to correctly target the installation folder for mecab.py and dictionnaries Nov 30, 2025
@JSchoreels JSchoreels changed the title macos : Fix installation to correctly target the installation folder for mecab.py and dictionnaries macos : fix installation to correctly target the installation folder for mecab.py and dictionnaries Nov 30, 2025
Copy link
Copy Markdown
Member

@Kuuuube Kuuuube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is all figured out now and good to go. Will wait a bit to merge incase there's any additional comments to be made.

@JSchoreels
Copy link
Copy Markdown
Author

Assuming this is all figured out now and good to go. Will wait a bit to merge incase there's any additional comments to be made.

Sure, thanks for the follow up

@Kuuuube Kuuuube merged commit e50d248 into yomidevs:master Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants