Skip to content

Commit c375996

Browse files
committed
Document why bytes filenames matter
CJK locales on Windows produce Shift-JIS, GBK, or EUC-KR filenames that aren't valid UTF-8. When those files land on Linux (or Docker, or WSL), os.listdir() returns bytes. The comment records the use case so future readers don't remove the bytes handling.
1 parent 17540bc commit c375996

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

src/binaryornot/helpers.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ def has_binary_extension(filename: str | bytes | Path) -> bool:
5959
:param filename: File path to check.
6060
:returns: True if the extension is in the known binary list.
6161
"""
62+
# bytes filenames matter for CJK locales (Shift-JIS, GBK, EUC-KR):
63+
# files created on Windows with a CJK locale produce non-UTF-8 names
64+
# that os.listdir() returns as bytes on Linux/Docker/WSL.
6265
if isinstance(filename, bytes):
6366
filename = os.fsdecode(filename)
6467
p = Path(filename) if not isinstance(filename, Path) else filename

0 commit comments

Comments
 (0)