Conversation
dra27
left a comment
There was a problem hiding this comment.
These functions are a great idea - they should also go in the functorial interface, I think?
|
Note the gotchas when using |
Thanks for the pointer. I added suitable (re-)definitions for |
|
LGTM. Could you add unit tests? |
|
I believe |
Thanks, done. |
Done. |
|
CI green! Seems like an easy one to review. Anyone willing to approve? |
|
I have one further comment, which is to do with the initial size of the Regardless of that, isn't initialising the hash table to twice the length of the list likely to result in a better load factor? |
The only problem is that it breaks the idiom
I'm not an expert; is 2 the best factor? |
|
This PR seems to be consensual; anyone up to approve it formally? |
stdlib/hashtbl.mli
Outdated
|
|
||
| val of_list : ('a * 'b) list -> ('a, 'b) t | ||
| (** [Hashtbl.of_list l] is a table built from the given bindings. The bindings | ||
| are added in the same order they appear in [l], using {!add}. |
There was a problem hiding this comment.
of_list uses add, of_seq uses replace. I don't know which one is better, but the inconsistency is unfortunate.
There was a problem hiding this comment.
Indeed. add seems more natural to me, but I agree we should use the same function for both.
| (** [Hashtbl.to_list h] is the list of bindings in [h]. The order in which the | ||
| bindings appear in the list is unspecified. However, if the table contains | ||
| several bindings for the same key, they appear in order of introduction, | ||
| that is, the most recent binding appears last. |
There was a problem hiding this comment.
Might be worth adding a test to check this property.
|
Oops - thanks for catching that 😊. I agree that |
|
Given that 4.08 includes some critical fixes to |
|
Ping. Can we reach an agreement on whether |
|
I do think there should be a consistency between If it's documented, I can live with the difference between |
|
Let's try to converge on this issue. To summarize, we have the following options :
What do you think we should do ? @dra27 @alainfrisch @c-cube @gasche To me it feels either 1 or 4 are probably best. |
|
I think that @c-cube's initial design of having both I think you should add |
Would that be:
(1) significantly increases the verbosity of code such as (2) might work better in practice, but it feels a bit ad hoc or at least foreign to current stdlib idioms (jQuery users would feel at home, though). |
|
It would be I think that your concern could be partially alleviated with a I guess you write |
Rather
The point was to preprocess the association list into a hashtable and return a fast lookup function which can be used many times. (Yes, |
|
What do you think of having |
|
I don't have strong opinions against it, but to me the problem is not only to choose add/replace, but also to pass the table-creation parameters (this was also a problem with #2202 which started by addin |
|
I stand by my original design. I very rarely use |
Independently of whether it is a good idea to have both, I don't like the names, which mix a functional view ("translate" from one data representation to another) and a procedural one (a verb describing how insertion is done, imperatively). As for the idea of having both functions, I share @gasche views (also not having a strong opinion).
I rarely use Again, I don't have a strong opinion, but if we had to choose only one sematics, the one based on I don't think this will bring anything to the discussion, but just to mention another possible direction: allow specifying at |
|
I agree with @c-cube, and as a side point, I think I also think the The same thing applies to an initial size in the particular |
It's tangential to this PR, but I looked into this a while ago. The compiler doesn't build in this mode, and it was utterly non-trivial to figure out why. At some point, I'll break that branch open again, if no one else does, but there be dragons... |
|
@alainfrisch's idea may be a reasonable compromise?! |
|
I agree with @gasche that the API makes much more sense with |
6091fb0 to
c671d24
Compare
OK, I pushed a new version adding the following functions (one "list" function for each existing "seq" function) : val add_list : ('a, 'b) t -> ('a * 'b) list -> unit
val replace_list : ('a, 'b) t -> ('a * 'b) list -> unit
val of_list : ('a * 'b) list -> ('a, 'b) t
val to_list : ('a, 'b) t -> ('a * 'b) list
val to_list_keys : ('a, 'b) t -> 'a list
val to_list_values : ('a, 'b) t -> 'b listwhere What do you think? Is it worth adding these specialised functions, or do we decide that we should just use the "seq" functions + |
|
I like |
|
PR seems to have stalled, even though I was under the impression that there was support, modulo naming, for its latest iteration (but I may be misremembering though). It consists in adding the following functions to Can some of the other developers that participated in the discussion chime in? Otherwise I would like to close this PR. @dra27 @gasche @alainfrisch @dbuenzli UPDATE: updated |
|
I'm not excited by the functions proposed in this PR.
Note: another approach would be to ensure that we can use List.fold_left Hashtbl.add_pair t li
List.fold_left_pair Hashtbl.add t li
List.fold_left (Fun.on_pair Hashtbl.add) t lifor Hashtbl.fold_pair List.cons []
Hashtbl.fold List.cons_pair []
Hashtbl.fold (Fun.curry List.cons) []I'm actually somewhat interested in |
Thanks for chiming in :) I tend to agree with your arguments. My own experience points to In any case, I think you made important points (which extend beyond this particular PR) to keep in mind for future evolutions of the stdlib. |
|
I might propose |
No description provided.