add resource-handling IO functions in pervasives#640
add resource-handling IO functions in pervasives#640c-cube wants to merge 4 commits intoocaml:trunkfrom
Conversation
stdlib/pervasives.ml
Outdated
| g x; | ||
| raise e | ||
|
|
||
| let open_out_gen mode perm name = |
There was a problem hiding this comment.
This does not have the right semantics if g raises. Better to use
match f x with
| res -> g x; res
| exception e -> g x; raise eThere was a problem hiding this comment.
Noting the various comments about this below, I realise that the uses of finally_ here shouldn't cause this semantics error, but is there any reason to leave this erroneous version of finally_ in and not replace it with @nojb's better suggestion (which doesn't try to call g twice if g raises)?
There was a problem hiding this comment.
Indeed, fixed. I hadn't paid enough attention to that.
There was a problem hiding this comment.
The same finally_ is defined in otherlibs/threads/pervasives.ml as well. I think it would be better to define it just once in pervasives and expose it, then you can call it in threads and not risk to have different implementations lying around.
|
Maybe this would be a good time to add a module like |
|
@hcarty indeed, I would love a @nojb indeed, it's not clear what should happen if Also, I need to write some tests, obviously (especially for |
stdlib/pervasives.mli
Outdated
| characters have been read. | ||
| @since 4.02.0 *) | ||
|
|
||
| val input_string_all : in_channel -> string |
There was a problem hiding this comment.
I find the name of this function ugly. Here are a few alternative proposals input_contents, string_of_in_channel. in_channel_to_string.
There was a problem hiding this comment.
I have no strong opinion, but it sounds like string_of_in_channel would fit well within the stdlib. I'd like more people to give their opinion on this, but the name will probably change indeed.
There was a problem hiding this comment.
We already have the pattern of input_line, input_byte etc. We also have the odd really_input_string. Maybe input_file, implying that you're reading the whole file?
Another function I use is input_lines, which creates a list of all the lines.
There was a problem hiding this comment.
@bluddy it doesn't have to be a file :-)
But indeed, input_lines would be nice too, it's often very convenient. In the absence of iterators(!), returning a list is acceptable.
Is this about possible clashes of unit names? There is indeed currently no good story about that. One possible direction is to always use names such as camlFoo for new modules, and provide perhaps a default map file so that user code can simply refer to Foo. Another direction is to add sub-modules to Pervasives (e.g. Pervasives.IO), but this is not very coherent with the current design, and does not allow to benefit from the removal of unused units at link time. |
|
@alainfrisch precisely, that is my concern (I'd also like to add an Maybe with module aliases, have |
|
@c-cube if we're going to have submodules via aliases, how about Std as the parent module? Much better than Pervasives IMO. Let's start making this thing right. Actually, C++ is considering having a new stdlib and incrementing the postfix number. This would potentially work really nicely for us with modules and allow us to increment the postfix: Std would be a logical remaking of the current standard library, and draw from the existing modules. We would at some point create an Std2, using combinations of possibly the same underlying modules, and not have to worry about backwards compatibility (since Std will still be there), etc. |
|
@bluddy my goal here is not to engineer a new, incompatible stdlib… I just want more things in it :p |
|
@c-cube Since you've already started this effort, how about another PR with moving everything to the Std module (or whichever one you prefer), while keeping everything old for backwards compatibility? Also, possibly reorganizing things to make more sense (e.g. things in Unix should be specifically Unix), getting rid of Stream in the new Std etc. When/if we want to remake the stdlib, we move to Std2 if needed. @alainfrisch since you brought up the concept of changes to stdlib, what do you think of this idea? |
I'm not sure I follow your proposal. What would go in Std? Only aliases to new Caml* modules and to existing modules? Or actual code? If we want to propose a whole new structure for the stdlib, one should use this opportunity for redesigning many aspects (exceptions, naming, use of optional arguments) and cleaning old stuff (removing deprecated modules and functions). I'm not sure this can be achieved with a simple Github PR, though. |
|
edit: wrong PR -_- |
|
@alainfrisch Mostly I'm trying to bounce ideas around. For years there's been this de facto approach (justified or not) of the standard library being a monolithic thing that cannot be changed. And now we've breached that wall, and I think we're (or I am) investigating the consequences. So true, we want to add some functionality. But as soon as we add it (and I'm fine with going ahead and doing it), we bump into the issue of the stdlib and its relatively bad module name choices, the fact that adding more modules will conflict with user modules, etc. So I want to think about ideas of moving away from the current position to one where the stdlib occupies mainly one global, short module name (something like Std), and where module names make sense. And we can do that with backwards compatibility since everything else will remain. Additionally, we don't necessarily need to make the first iteration perfect (ie. the enemy of the good) because we can increment the posfix and have Std2, Std3 etc. as needed (within reason, of course). These would all be modules with module aliases to other modules (though namespaces of course would also be nice if we ever get them). So a first move might be just shifting all the current functionality to Std and organizing it better. A further move might be a more long-term project of designing a new stdlib with improved functions, and placing everything in Std2. This will be very hard to do in anything resembling a stable library, so Std can be the stable library and for a while Std2 will be entirely experimental. Etc. |
Really, nothing changed in the policy w.r.t. evolutions of the standard library. All previous releases already accepted contributions. The recent addition to CONTRIBUTING.md was simply to counter a widespread misconception about the stdlib being stalled, or somehow tied to specific needs of the compiler itself. I think it's more productive to focus on lifting specific limitations of the stdlib than trying to experiment with very different designs, which is better left to external projects. |
@alainfrisch That has been the approach so far, but I'm questioning whether it has to be the approach. If our assumption is that we have to live with the decisions made in the standard library forever, that simply isn't true if you have a single 'namespace' like Std, which you can increment when a reboot is needed. And many people look to the standard library for guidance -- they don't want to choose between Core, Batteries or Containers. Why not use the experience gained from creating these external libraries to make a new, clean standard library? Or at the very least, for a single iteration, moving everything into Std and cleaning up the cruft that has built up over time? |
The answer to this question is mostly lack of manpower. I would very much like to see a good proposal for versioning the standard library, designing an improved version, then deprecating the current version, but that's a large amount of work and I don't know who's going to do it. |
That only works for reads. For writes to ensure the data is actually written successfully one needs to check errors for both fsync() and close(). This would require returning both the result of f and the exception thrown by fsync or close. I believe the current design would lead to people not checking for errors. |
|
@mrvn the last commit uses |
abcc272 to
62d3337
Compare
|
Note: had to bootstrap because |
|
I just saw https://ocaml.org/learn/tutorials/file_manipulation.html, and am a bit puzzled that in 2017, the recommended way of manipulating files in OCaml is unsafe. The current code is: open Printf
let file = "example.dat"
let message = "Hello!"
let () =
(* Write message to file *)
let oc = open_out file in (* create or truncate file, return channel *)
fprintf oc "%s\n" message; (* write something *)
close_out oc; (* flush and close the channel *)
(* Read file and display the first line *)
let ic = open_in file in
try
let line = input_line ic in (* read line from in_channel and discard \n *)
print_endline line; (* write the result to stdout *)
flush stdout; (* write on the underlying device now *)
close_in ic (* close the input channel *)
with e -> (* some unexpected exception occurs *)
close_in_noerr ic; (* emergency closing *)
raise e (* exit with error: files are closed but
channels are not flushed *)but imho it would be much better as: open Printf
let file = "example.dat"
let message = "Hello!"
let () =
(* Write message to file *)
with_open_out file (fun oc -> fprintf oc "%s\n" message);
(* Read file and display the first line *)
with_open_in file (fun ic ->
let line = input_line ic in
print_endline line;
flush stdout); |
|
I don't think we should add anything to pervasive unless the benefit is huge |
|
I'd argue the benefit in this case is huge. But even then, we can consider having a sub-module |
|
I had the impression that most people write their own (possibly flawed) version of |
|
So, any chance this makes it into 4.05? :-/ |
|
This seems unlikely until #1010 is merged (and there are still a couple of issues being worked on around that) |
|
Is there any chance of In fact, I had come here to make a PR to add it but it looks like it's already part of this PR (only hidden) |
|
@alainfrisch (and any other dev team member interested) just to confirm are we still ok with that plan ? If that is the case I will try to cook something up in time for |
|
I think it is a good idea, but mrvn's comment about how we check for errors should be addressed. We have discussed at lengths the perils of raising during clean-up during I think the behaviour should be consistent between normal and exceptional return paths, that is to say silently ignore exceptions arising during close. You will object, but then how will one ever be able to reliably handle and reason about errors? For that, the programmer can build on it an abstraction with explicit commit semantics, depending on the application. Here's an example that shows how one can do so in the case of temporary files where one wants 1) to only keep the file if valid, and 2) that the file counts as valid only if some programmer-specified success path completes without error. Moreover, this is implemented by building on top on module type Output = sig
type channel
val with_open : string -> (channel -> 'a) -> 'a
val output_string : channel -> string -> unit
val flush : channel -> unit
val create_file : string -> unit
val remove_file : string -> unit
end
module type Temp_file = sig
type t
(* removes temp file when in a non-properly committed state *)
val with_temp_file : string -> (t -> 'a) -> 'a
val output_string : t -> string -> unit
(* raise if committing fails *)
val commit : t -> unit
end
module Temp_file (Output : Output) : Temp_file = struct
type t = { channel : Output.channel ; valid : bool ref }
let with_sentinel filename f =
let _ = Output.create_file filename in
let valid = ref false in
let finally () = if not (!valid) then Output.remove_file filename in
Fun.protect ~finally (fun () -> f valid)
let with_temp_file filename f =
with_sentinel filename @@ fun valid ->
Output.with_open filename @@ fun channel ->
f { channel ; valid }
let output_string { channel ; valid } string =
valid := false ;
Output.output_string channel string
let commit { channel ; valid } =
Output.flush channel ;
valid := true
endThere might be other strategies for other applications, but for a general-purpose |
|
@gadmm just to be clear I'm not going to add the resource handling functions. Just do the work to put what is in currently |
|
@dbuenzli, ok, then my message is addressed to anyone wants to finish the current PR after that. |
|
Reading the discussion again, it seems the (only?) argument in favor of putting all functions in the same module is to keep using verbs for reading/writing functions without adding redundancy ( Other points to be discussed:
A deeper point is whether we are satisfied enough with current channels to "modernize" their API. If the new API is to be deprecated soon, it wouldn't be so good. One could prefer to use this redesign effort as an opportunity to provide better alternatives. Two directions come to mind:
(How this would interact with Marshal remains to be seen...) |
|
@alainfrisch My proposal was rather to provide an easy migration path (basically qualify your unqualified uses of channel functions with The questions you raise are certainly legitimate and interesting but you are basically calling for a redesign on how IO is done in OCaml and I don't think they can be answered in the time I can personally allocate on this. Also I'm not sure it's a good idea to do that now and/or even fundamentally change the way things are named now. So if someone else wants to pickup the ball, I'll personally leave it here for now. |
|
I think before we can redesign channels and such we need to have a solid idea of how versioning will work on the stdlib. Once we have versioning, it opens up other possibilities that are currently painful to deal with, including getting rid of a bunch of deprecated stuff. |
|
@dbuenzli Are you referring to the "deeper point", or to discussion on naming and other superficial points? I think it would really be a shame to just move existing function to For the bigger redesign, it's probably out of reach in the short term unless someone volunteers (and that might be better experimented outside the Stdlib anyway); we will likely live with the current implementation for some time anyway, so feel free to ignore this part. |
|
Personnally I value backward compatibility more than a brand new redesign. Don't break code at all (yes, recent threads have been written about that). All I wanted with this PR was to upstream some of the functions I use all the time in containers (in an |
|
@c-cube : nobody suggests breaking backward compatibility here. The points I raised (except for the "deeper question", but let's exclude it) are not about a brand new design, but about streamlining the existing one. If we don't even allow ourselves discussing such points while shuffling the surface API, we could as well continue throwing everything in Stdlib directly. The extra time spent on discussing the API of e.g. open functions seems relevant at this point, including for your initial proposal (which inherits the need to expose 3 variants). |
|
@alainfrisch I tend to consider even a deprecation warning as breaking, these days, as |
|
I did not even discuss the addition of deprecation warnings! This question is orthogonal to the points I raised. |
otherlibs/threads/stdlib.ml
Outdated
| | [] -> l2 | ||
| | x :: tail1 -> rev_append tail1 (x :: l2) | ||
| in | ||
| rev_append l [] |
There was a problem hiding this comment.
Can't invocations of this be replaced with List.rev_append l []?
There was a problem hiding this comment.
It's the same issue as having to redefine protect instead of using Fun.protect. I don't know exactly the internals, but afaik Stdlib cannot refer to the other modules.
Yes I was. I also think new forms of IO should likely been experimented outside the stdlib first.
Maybe you are right but I'm wondering whether the streamlining you mention is worth doing at all. It seems rather minor points and/or things deeply rooted in the design like |
Meta-point: I know from feedback I've received that the |
I agree having We still need a value
Seems logical to do so. Should we be concerned about the cost of this when reading a channel byte-per-byte (using
I thought these functions were supposed to be used in
I wonder if this is really needed. |
So one would replace |
|
I'm for names that are as short as possible so long as they express everything necessary.
However, how about we merge these functions first, and then talk about a new module design? |
I would actually do ...
let open Out_channel in
...
write_string stdout "FOO"
...It's not that bad imho |
- safe `with_open_{in,out}` functions
- add `input_lines` to read all the lines into a list
- also modify otherlibs/threads/pervasives.ml
|
I believe this is superseded by #8937 |
|
The string_of_in_channel implementation could probably use Unix.stats then read st_size in the returned value and allocate a buffer of the exact size in one go. |
…ml#640) Co-authored-by: Sabine Schmaltz <sabine@tarides.com>
Two things here:
in_channelinto a string. This is yet another piece of code that gets rewritten literally everywhere.