Original bug ID: 6695
Reporter: @whitequark
Assigned to: @whitequark
Status: closed (set by @xavierleroy on 2016-12-07T10:37:18Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #3771 #6692 #6694 #6697
Monitored by: @gasche @hcarty
Bug description
Currently, ocamlc uses String.capitalize and String.uncapitalize extensively when deriving filenames from module names and vice versa. These functions treat the strings as ISO-8859-1, and attempt to case-fold letters such as \248 (ø).
Today, no supported operating system where OCaml runs always encodes paths as ISO-8859-1. Rather, UTF-8 is used on sane platforms, and a locale-specific encoding on Windows. Thus, this case-folding is practically always broken and the derived name will contain garbage if the first letter is not included in US-ASCII.
This is a separate issue from #6694. Not only the impact in this case is very clear and the scope is limited to the compiler, but the current behavior is also more clearly broken.
Original bug ID: 6695
Reporter: @whitequark
Assigned to: @whitequark
Status: closed (set by @xavierleroy on 2016-12-07T10:37:18Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #3771 #6692 #6694 #6697
Monitored by: @gasche @hcarty
Bug description
Currently, ocamlc uses
String.capitalizeandString.uncapitalizeextensively when deriving filenames from module names and vice versa. These functions treat the strings as ISO-8859-1, and attempt to case-fold letters such as \248 (ø).Today, no supported operating system where OCaml runs always encodes paths as ISO-8859-1. Rather, UTF-8 is used on sane platforms, and a locale-specific encoding on Windows. Thus, this case-folding is practically always broken and the derived name will contain garbage if the first letter is not included in US-ASCII.
This is a separate issue from #6694. Not only the impact in this case is very clear and the scope is limited to the compiler, but the current behavior is also more clearly broken.