Skip to content

Fix naming of shared Unicode stubs#1363

Merged
gasche merged 2 commits intoocaml:trunkfrom
nojb:fix_1200_naming
Sep 27, 2017
Merged

Fix naming of shared Unicode stubs#1363
gasche merged 2 commits intoocaml:trunkfrom
nojb:fix_1200_naming

Conversation

@nojb
Copy link
Copy Markdown
Contributor

@nojb nojb commented Sep 21, 2017

This PR fixes a naming oversight in #1200. There are three functions used in C code shared between Unix and Windows to support Unicode under Windows which are called

  • caml_stat_strdup_to_utf16 [converts from OCaml-side to OS-side]
  • caml_stat_strdup_of_utf16 [converts from OS-side to OCaml-side]
  • caml_copy_string_of_utf16 [converts from OS-side to OCaml-side]

These names make no sense under Unix (where they are just synonyms for caml_stat_strdup, caml_stat_strdup, and caml_copy_string respectively). Let us rename them to something more neutral. In #1200 @dra27 had suggested using _os as suffix (I lost track of the suggestion back then):

  • caml_stat_strdup_to_os
  • caml_stat_strdup_of_os
  • caml_copy_string_of_os

Opinions ?

While on this topic, it may be worthwhile to revisit the other names introduced by #1200, see here. For example, the name _texecve stands for execve under Unix and _wexecve under Windows. The reason for this naming choice was partly due to it being like this in the original Unicode PR and partly because it follows Windows conventions. But I would be happy to hear any suggestions for improvements.

Also: the changes in #1357 will need to be updated to take into account this PR.

@gasche @alainfrisch @dra27

@alainfrisch
Copy link
Copy Markdown
Contributor

*_to_os seems good to me.

What about execve_os, and char_os instead of wchar_t?

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 21, 2017

execve_os and the like seem fine to me. For the character type we use charnat but it should be renamed char_os for consistency, indeed.

I will wait a little before updating the PR to give more time in case there are other opinions.

#ifdef _WIN32

static wchar_t * read_section_to_utf16(int fd, struct exec_trailer *trail, char *name)
static wchar_t * read_section_to_os(int fd, struct exec_trailer *trail, char *name)
Copy link
Copy Markdown
Contributor

@alainfrisch alainfrisch Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where I found wchar_t. Should this be charnat (resp. char_os) for consistency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this minor point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@nojb nojb changed the base branch from 4.06 to trunk September 21, 2017 15:00
@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 21, 2017

Rebased on trunk.

@damiendoligez damiendoligez added this to the 4.06.0 milestone Sep 22, 2017
@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 23, 2017

I still have some questions about the use of caml_stat_strdup{,of_os,to_os} functions from my newbie perspective:

  • What are the reasons to use caml_stat_strdup directly instead of caml_stat_strdup_of_os? My guess would be that it is when you really want char * instead of oschar *, but it's not clear to me that it actually happens within the compiler distribution. Furthermore, there remain 25 uses of caml_stat_strdup within otherlibs/unix (git grep "caml_stat_strdup(" otherlibs/unix), versus 11 uses of caml_stat_strdup_to_utf16, can all of these be converted to to_os? (Or is there a good reason for the inconsistency?)

  • same question with caml_copy_string_of_os: when do we not want to use it?

  • caml_stat_strdup has a _noexc variant. Would such variants be useful for the new _{of,to}_os functions and other GC-using functions introduced by this patch?

  • when do we want to use caml_stat_strdup_of_os? Currently there is one use in debugger.c that I don't understand, one use in win32.c, and all other uses are to be passed to a %s format (usually in an error function, so the allocation is never explicitly collected)

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 23, 2017

Thanks for the questions @gasche ! Indeed, simplifications along the lines you mention may be possible.

  • Basically the intention with the _os-suffixed functions vs the non-suffixed functions was to make it explicit that the argument (or result) was being received from (or sent to) the OS and had a particular encoding. On the other hand, caml_{stat_strdup,copy_string} can be used to copy arbitrary byte buffers, but as you point out this may not actually happen in the compiler distribution.

  • Another point was to make the diff as small as possible and make it clear that the parts of the unix library that are not shared with Windows remain unmodified.

  • For caml_stat_strdup_of_os you are right there are not many uses. This function was introduced more as a way to keep the diff small by using existing code that assumed char *, but indeed it could be eliminated by some further refactoring.

All in all I agree that it is very appealing to try to get rid of the _os variations and just have caml_stat_strdup and caml_copy_string. I will experiment along these lines and report the results.

Just to summarize your suggestion is to change the signature of caml_{stat_strdup,copy_string} to

CAMLextern os_char * caml_stat_strdup(const char *s);
CAMLextern value caml_copy_string (os_char const *);

(where os_char is char on Unix and WCHAR on Windows), and use that everywhere.

@xavierleroy
Copy link
Copy Markdown
Contributor

All in all I agree that it is very appealing to try to get rid of the _os variations and just have caml_stat_strdup and caml_copy_string

No way. It's perfectly legitimate to copy bytes as just byte with no OS-specific encoding implied. That the need doesn't show up in today's (Windows version of) the runtime system is an accident.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 23, 2017

What I think would be useful to have by release time are clear guidelines on how to write portable string-manipulating C code for the OCaml runtime (inside the distribution or in user's bindings), and hopefully to have these guidelines be followed by the code in the compiler distribution.

(Another interesting question is whether we understand the backward-compatibility impact of these guidelines, that is the ability to write code that works under both old and new OCaml versions. I suppose that currently the answer is "wait until the oldest version you want to support has _os functions, and switch only then".)

@xavierleroy
Copy link
Copy Markdown
Contributor

The _os names look OK to me, by the way. Concerning the other question:

While on this topic, it may be worthwhile to revisit the other names introduced by #1200, see here. For example, the name _texecve stands for execve under Unix and _wexecve under Windows.

I'm not a big fan of those function names that start with an underscore, first because they scream "Windows!", second because the ISO C99 standard has a few choice words to say about them (7.1.3):

— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.

The Windows names such as _wexecve fit this spec, mind you. Still, so many restrictions make me nervous. So, I welcome alternative ideas for names, although I have none to offer at the moment.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 25, 2017

What about using the _os suffix for all such symbols?

  • char_os for the character type char / WCHAR,
  • execve_os for execve/_wexecve, and so on, and
  • caml_stat_strdup_{of,to}_os for caml_stat_strdup, etc.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 25, 2017

It seems fine to me.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of this change. Would a third person have a look, and merge if they also agree?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 26, 2017

On the other hand, I believe that this needs to be rebased on top of the just-merged #1357 before merging.

@alainfrisch
Copy link
Copy Markdown
Contributor

It's hard to do a useful review of such mostly mechanical renaming, but this all looks good to me, and @xavierleroy validated the terminology, so I think this is good to merge after rebasing. (There is also just one pending minor comment.)

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 26, 2017

Incidentally the Travis failures seem to be unrelated to the Unicode stuff, having to do with the symbol caml_atom_table.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 27, 2017

Note: this will have to be rebased again when #1362 is merged.

The caml_atom_table failure is a separate issue that already occurs in trunk, so we will handle it in another place. It should not block merging this PR.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 27, 2017

LGTM too and ready to rebase now that #1362 is merged.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

Rebased on trunk.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

Anyone wants to restart Travis, or merge directly ?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 27, 2017

I restarted the CI build. We have long been waiting to see the greens!

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 27, 2017

@nojb: could you maybe add this PR number to the Changes entry about the Windows runtime? It would be nice if someone following the PRs could get the whole picture -- especially given that currently those discussions and sources are the only source of documentation for the C interface.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

Sure, makes sense.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

OK, added all related PRs #1357, #1362, #1363 to Changes.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

All green! Merge?

@gasche gasche merged commit 050ceef into ocaml:trunk Sep 27, 2017
@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 27, 2017

Thanks!

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 27, 2017

Huzzah - well done @nojb!

gasche added a commit that referenced this pull request Sep 27, 2017
Fix naming of shared Unicode stubs
@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 27, 2017

Merged, and cherry-picked in 4.06 ( 6882353 ).

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 27, 2017 via email

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* Create invitation-to-contribute

* Rename invitation-to-contribute to invitation-to-contribute.md

* remove kind from title space

* added description field

* Delete invitation-to-contribute.md

* Create invitation-to-contribute.md

* fix image link

* image for contribute

* added calendar dates and Discord link

* Update data/news/ocamlorg/invitation-to-contribute.md

Co-authored-by: sabine <sabine@users.noreply.github.com>

* Update data/news/ocamlorg/invitation-to-contribute.md

Co-authored-by: sabine <sabine@users.noreply.github.com>

---------

Co-authored-by: sabine <sabine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants