Fixed Exiftool command line overflowing max length#1052
Fixed Exiftool command line overflowing max length#1052jaimetur merged 2 commits intojaimetur:mainfrom
Conversation
…es from temp file, not the command line
Exiftool works with both but the one they recommend is without the hyphen. Signed-off-by: stefan-sherwood <spammerblackhole-github@yahoo.com>
stefan-sherwood
left a comment
There was a problem hiding this comment.
This should fix #1036
|
What is exactly the issue that you want to fix with this PR? What I remember is that using @ in exiftool command causes other collateral effect, that's why I decided to remove it. On the other hand I see that you have removed the charset for windows and force to use always UTF-8. This also has some collateral effect on some filenames with special characters. |
|
This fixes an error that people are seeing with the tool failing to apply exif changes and leaving them in a bad state. It happens because in all OSes there is a limit to the size of the command line you can pass in on process open. Details of the symptom is outlined in the linked bug. Using utf-8 on the filename is 100% safe in all operating systems since that file is only used for reading filenames. I tested this in Windows, which is the only place where non-utf-8 was being used anyway and it 100% resolved the problem. I tested this with 200,000 files on Windows (also ran it with ~130,000 files in a different path) and both failed before the change and worked without issue with it. This also opens the door to removing the 1,000 file at a time batching, which should also speed up the process a bit. |
|
I remember that when I develop this part, I had so many issues using exiftool from a list instead of passing each filename as argument, so I decided to keep using argument because in thss way if some file fails you can see exactly which file failed but if you use a list you cannot debug the failed file. On the other hand, using batches has several purposses, one of them is split the number of files to shorten the number of arguments to be passed to exiftool, but other purposse is to use paralellization, each batch can be launched in a different thread, so the processing time is reduced a lot. Have you checked the performance before and after this patch? |
|
The number of 1000 in each batch was tested on my side with no issue, but maybe on some conditions or some OS, it should be reduce to 500 maybe. But I still prefeer to pass each file as argument rather tan use the list option because if I'm not wrong with list you can't debug which files fails, and if one file fail, the whole list will fail and the assets need to be enqueued again without batches ending up in a slower performance. I had so many troubles with this option when I develop this, that's why I'm asking so much before to approve yout PR. Many thanks for it BTW, I appreciatte a lot your effort and I would need more help of other developpers to continue improving the tool. BTW, have you try it with the new Web interface that I have just release today? Running the tool with the web interface have several benefits, one is that it is more easy to deal with so many arguments, and the second is that since you are running the tool on docker, the backend is always running in linux, and in linux you will not have such limitations of number of arguments to be passed to exiftool |
|
As I an others have reported, you cannot see which file(s) have failed as it currently stands. Nowhere in the log or error messages does it say what the command line is. In the case of large batches it would be impractical to put anyway because it's 1,000 files at a time. Exiftool is optimized for and very robust about being passed files with indirection and is the standard way of processing batches of files. I did not do a performance test but Exif tool is documented to be much faster when used with indirection because it by default keeps the process open rather than spinning up a new one for every file. |
|
On your other point, every OS has a limit for the argument size. In Linux it is decided by the specific Kernel as ARG_MAX. I just queried that on my Ubuntu machine and it reports this: |
|
I'm going to approve your PR, let's hope that there is no performance impact with it. Thank you very much for your support and help. The changes will be applied on v3.8.0. I encourage you to try and test the new web interface and let me know how you find it. Thanks a lot. |
- Adjusted Goggle Photos Panel layout. - Enhancements in Execution Log windows to parse progress bar properly. - Allow page reload while task is running. - Fixed Exiftool command line overflowing max length (#1052). - Fixed Exiftool not embebed on docker-dev version. - Other bug fixing.
…ial-folder names (`Archive`, `Trash`, `Locked folder`) and abort early with a clear message; same validation is enforced in Web UI folder selection (Issue #1008). - Fixed Google Takeout GPTH input root normalization to handle direct `Google Photos` paths reliably (without copy fallback), preventing "Discover Media = 0" scenarios on NAS layouts (Issue #1013). - Fixed `LocalFolder.get_albums_owned_by_user()` raising `UnboundLocalError` (`albums_filtered`) when called with `filter_assets=False` during automatic migration album checks (Issue #1014). - Fixed Synology Live Photo downloads where ZIP payloads could be saved with `.HEIF` extension; ZIP payloads are now detected and extracted properly (Issue #1028). - Fixed download/migration EXIF date overwrite behavior: EXIF date tags are now only filled when missing, preserving existing shooting dates (Issue #1029). - Fixed Exiftool command line overflowing max length (#1052). - Fixed Exiftool not embebed on docker-dev version. - Other bug fixing.
Now passing Exiftool file paths to be processed via indirection
→ This makes path lengths irrelevant
→ This allows for an arbitrary number of files to be processed