Skip to content

Do not treat paths as encoded in ISO-8859-1 #6695

@vicuna

Description

@vicuna

Original bug ID: 6695
Reporter: @whitequark
Assigned to: @whitequark
Status: closed (set by @xavierleroy on 2016-12-07T10:37:18Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #3771 #6692 #6694 #6697
Monitored by: @gasche @hcarty

Bug description

Currently, ocamlc uses String.capitalize and String.uncapitalize extensively when deriving filenames from module names and vice versa. These functions treat the strings as ISO-8859-1, and attempt to case-fold letters such as \248 (ø).

Today, no supported operating system where OCaml runs always encodes paths as ISO-8859-1. Rather, UTF-8 is used on sane platforms, and a locale-specific encoding on Windows. Thus, this case-folding is practically always broken and the derived name will contain garbage if the first letter is not included in US-ASCII.

This is a separate issue from #6694. Not only the impact in this case is very clear and the scope is limited to the compiler, but the current behavior is also more clearly broken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions